cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to manage dev/prod environments with new unity catalog delta live table integration

Jfoxyyc
Valued Contributor

Heres my current setup, dev workspace connected to dev keyvault and a prod workspace connected to a prod keyvault. There's a github repo and action syncing the two environments on pull request and all resources created through terraform. This is my normal workflow:

Go to dev workspace

Create new resource, say a DLT pipeline

Populate and test resource in dev workspace against dev catalog

Pull request to prod

Terraform creates new resource and schedules in prod workspace against prod catalog

My issue is, say I create a pipeline to populate tables for say raw.schema_name.table_name, when I promote the pipeline to prod workspace it will tell me there is already a pipeline managing the table.

2 REPLIES 2

Anonymous
Not applicable

@Jordan Fox​ :

To manage dev/prod environments with the new Unity Catalog Delta Live Table (DLT) integration, you could consider the following approach:

  1. Use separate Delta Lake tables for dev and prod environments, with different table names and paths. This will ensure that the data and metadata in each environment are kept separate and do not interfere with each other.
  2. When creating a DLT pipeline in the dev workspace, use a naming convention that includes a prefix or suffix that identifies the dev environment. For example, you could use the pipeline name "dev_populate_raw_schema_name_table_name". This will help you identify which pipelines are specific to each environment.
  3. When promoting a pipeline to the prod workspace, update the pipeline name to remove the dev environment identifier. For example, rename "dev_populate_raw_schema_name_table_name" to "populate_raw_schema_name_table_name". This will ensure that the pipeline is recognized as the same pipeline that manages the table in the prod environment.
  4. Before promoting a pipeline, ensure that the DLT table in the prod environment has been created and is ready to receive data. You may need to manually create the table or use a separate pipeline to create it.
  5. Test the pipeline in the prod environment before using it for production data. You can use a smaller dataset or a test environment to ensure that the pipeline is working as expected.
  6. Consider using a metadata management tool like Apache Atlas to manage the metadata for your DLT tables. This will help you keep track of which pipelines and tables are associated with each environment, and ensure that the metadata is consistent across environments.

By following these steps, you should be able to manage dev/prod environments with the new Unity Catalog Delta Live Table integration and ensure that your pipelines and tables are properly synchronized across environments.

Anonymous
Not applicable

Hi @Jordan Fox​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group