Databricks Community

Jfoxyyc · ‎04-30-2023

Heres my current setup, dev workspace connected to dev keyvault and a prod workspace connected to a prod keyvault. There's a github repo and action syncing the two environments on pull request and all resources created through terraform. This is my normal workflow:

Go to dev workspace

Create new resource, say a DLT pipeline

Populate and test resource in dev workspace against dev catalog

Pull request to prod

Terraform creates new resource and schedules in prod workspace against prod catalog

My issue is, say I create a pipeline to populate tables for say raw.schema_name.table_name, when I promote the pipeline to prod workspace it will tell me there is already a pipeline managing the table.

Anonymous · ‎05-13-2023

@Jordan Fox :

To manage dev/prod environments with the new Unity Catalog Delta Live Table (DLT) integration, you could consider the following approach:

Use separate Delta Lake tables for dev and prod environments, with different table names and paths. This will ensure that the data and metadata in each environment are kept separate and do not interfere with each other.
When creating a DLT pipeline in the dev workspace, use a naming convention that includes a prefix or suffix that identifies the dev environment. For example, you could use the pipeline name "dev_populate_raw_schema_name_table_name". This will help you identify which pipelines are specific to each environment.
When promoting a pipeline to the prod workspace, update the pipeline name to remove the dev environment identifier. For example, rename "dev_populate_raw_schema_name_table_name" to "populate_raw_schema_name_table_name". This will ensure that the pipeline is recognized as the same pipeline that manages the table in the prod environment.
Before promoting a pipeline, ensure that the DLT table in the prod environment has been created and is ready to receive data. You may need to manually create the table or use a separate pipeline to create it.
Test the pipeline in the prod environment before using it for production data. You can use a smaller dataset or a test environment to ensure that the pipeline is working as expected.
Consider using a metadata management tool like Apache Atlas to manage the metadata for your DLT tables. This will help you keep track of which pipelines and tables are associated with each environment, and ensure that the metadata is consistent across environments.

By following these steps, you should be able to manage dev/prod environments with the new Unity Catalog Delta Live Table integration and ensure that your pipelines and tables are properly synchronized across environments.

Anonymous · ‎05-18-2023

Hi @Jordan Fox

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks Community

How to manage dev/prod environments with new unity catalog delta live table integration

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!