Sync prod WS DBs to dev WS DBs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-29-2022 02:15 PM
We have a couple sources we'd already set up to stream to prod using a 3p system. Is there a way to sync this directly to our dev workspace to build pipelines? eg. directly connecting to a cluster in prod and pull with a job cluster, dump to S3 and use autoloader, or maybe there's a way to create a shared DBFS and just share on this?
We initially created the dev / prod workspaces using the automagical workspace creating tool, so I'm unfamiliar with how setting up a shared dbfs would work.
- Labels:
-
Data Bricks Sync
-
Ingestion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2022 01:22 PM
DBFS can be used in many ways.
Please refer below:
- Allows you to interact with object storage using directory and file semantics instead of cloud-specific API commands.
- Allows you to mount cloud object storage locations so that you can map storage credentials to paths in the Databricks workspace.
- Simplifies the process of persisting files to object storage, allowing virtual machines and attached volume storage to be safely deleted on cluster termination.
- Provides a convenient location for storing init scripts, JARs, libraries, and configurations for cluster initialization.
- Provides a convenient location for checkpoint files created during model training with OSS deep learning libraries.
https://docs.databricks.com/dbfs/index.html#what-can-you-do-with-dbfs
Please let us know if this helps or you need further clarification on the same.