How to register datasets for Detectron2

SarahDorich
New Contributor II

I'm trying to run a Detectron2 model in Databricks and cannot figure out how to register my train, val and test datasets. My datasets live in an Azure data lake. I have tried the following with no luck. Any help is appreciated.

1) Specifying full path to Azure:

path_to_data = "abfss://<>@<>.dfs.core.windows.net/recommender/house-detector-datasets"

from detectron2.data.datasets import register_coco_instances

register_coco_instances("house_train3", {}, f"{path_to_data}/train/instances_default.json", f"{path_to_data}/train")

2) Moving to temporary local storage first:

import os

os.mkdir("house-detector-datasets")

my_blob_folder = "abfss://<>@<>.dfs.core.windows.net/recommender/house-detector-datasets"

dbutils.fs.cp(my_blob_folder, "house-detector-datasets", recurse=True)

path_to_data = "house-detector-datasets"

register_coco_instances("house_train4", {}, f"{path_to_data}/train/instances_default.json", f"{path_to_data}/train")

3) Moving to dbfs first:

Same code as 2) except moving to dbfs:/tmp/.

In all of these cases, I get the error when I try and access my registered datasets (for example, the code below fails with the error "No such file or directory")...

my_dataset_train_metadata = MetadataCatalog.get("house_train3") dataset_dicts = DatasetCatalog.get("house_train3")