Sync prod WS DBs to dev WS DBs

Mr__E
Contributor II

We have a couple sources we'd already set up to stream to prod using a 3p system. Is there a way to sync this directly to our dev workspace to build pipelines? eg. directly connecting to a cluster in prod and pull with a job cluster, dump to S3 and use autoloader, or maybe there's a way to create a shared DBFS and just share on this?

We initially created the dev / prod workspaces using the automagical workspace creating tool, so I'm unfamiliar with how setting up a shared dbfs would work.

Debayan
Databricks Employee
Databricks Employee

DBFS can be used in many ways.

Please refer below:

  • Allows you to interact with object storage using directory and file semantics instead of cloud-specific API commands.
  • Allows you to mount cloud object storage locations so that you can map storage credentials to paths in the Databricks workspace.
  • Simplifies the process of persisting files to object storage, allowing virtual machines and attached volume storage to be safely deleted on cluster termination.
  • Provides a convenient location for storing init scripts, JARs, libraries, and configurations for cluster initialization.
  • Provides a convenient location for checkpoint files created during model training with OSS deep learning libraries.

https://docs.databricks.com/dbfs/index.html#what-can-you-do-with-dbfs

Please let us know if this helps or you need further clarification on the same.