Delta Lake as source of images to train a classification model on a local computer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-23-2021 08:52 AM
Hi Folks,
I'm evaluating Delta Lake to store image / data version control to be used to train models. I looked at a session explaining how to do this and also using MLflow to manage training (https://databricks.com/session_na21/image-processing-on-delta-lake).
Note: it'd be interesting to have a link to the source code used in the demo.
I have a slightly different scenario, though. Testing is being performed on a local machine following the quick tutorial (https://docs.delta.io/latest/quick-start.html). In this scenario, what is the best way (using as much out-of-the-box components as possible) to "grab" a local folder with images organized into subfolders (classes) and dump them into delta lake and then use a specific snapshot on tensorflow?
Thanks- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2021 07:17 AM
I can think of 3 ways for doing this:
- using the web UI (the create table option or upload data into DBFS)
- using databricks-connect, which bridges your local machine with the remote databricks clusters
- using the databricks-cli to copy local files to dbfs
your cloud vendor might also have a tool to copy local data into the cloud environment.
For your purpose (evaluating) the web UI option might be the easiest.
https://docs.databricks.com/data/data.html
https://docs.microsoft.com/en-us/azure/databricks/data/databricks-file-system#file-upload-interface

