Suggest ways to get unity catalog data to Aws s3 or sagemaker
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2024 06:31 AM
Please suggest best ways to get databricks unity catalog data to Aws s3 or sagemaker. Data could be around 1gb in some tables and 20gb in others.
currently sagemaker pipelines use data from s3 as batches in different parquet files. But now we would like to keep the sagemaker pipelines as is but get the data from unity catalog. Please suggest the best possible ways to do this. Thanks in advance.
What we tried earlier: delta sharing (unsure if it could work well with this huge data-kindly suggest on this and also the permissions to setup)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2024 06:35 AM
You could try by using Delta Sharing with your provider as mentioned in doc https://docs.databricks.com/en/delta-sharing/set-up.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2024 07:08 AM
Thanks for your response. It helps. Also please suggest on unity catalog open api as well. As we are trying feasibility analysis because the data is huge, would like to have alternatives at hand to try further. Other options to do this use case are welcome as well. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2025 12:13 AM
Hi team any suggestions on the last comment please?

