I'm trying to run a Detectron2 model in Databricks and cannot figure out how to register my train, val and test datasets. My datasets live in an Azure data lake. I have tried the following with no luck. Any help is appreciated.
1) Specifying full path to Azure:
path_to_data = "abfss://<>@<>.dfs.core.windows.net/recommender/house-detector-datasets"
from detectron2.data.datasets import register_coco_instances
register_coco_instances("house_train3", {}, f"{path_to_data}/train/instances_default.json", f"{path_to_data}/train")
2) Moving to temporary local storage first:
import os
os.mkdir("house-detector-datasets")
my_blob_folder = "abfss://<>@<>.dfs.core.windows.net/recommender/house-detector-datasets"
dbutils.fs.cp(my_blob_folder, "house-detector-datasets", recurse=True)
path_to_data = "house-detector-datasets"
register_coco_instances("house_train4", {}, f"{path_to_data}/train/instances_default.json", f"{path_to_data}/train")
3) Moving to dbfs first:
Same code as 2) except moving to dbfs:/tmp/.
In all of these cases, I get the error when I try and access my registered datasets (for example, the code below fails with the error "No such file or directory")...
my_dataset_train_metadata = MetadataCatalog.get("house_train3") dataset_dicts = DatasetCatalog.get("house_train3")