cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

delta live tables - collaborative development

ismaelhenzel
Contributor II

I would like to know the best practice for collaborating on a Delta Live Tables pipeline. I was thinking that each developer should have their own DLT pipeline in the development workspace. Currently, each domain has its development catalog, like sales_dev.gold. The problem is that with DLT, I can't have two pipelines running in the same schema with the same table names. You get the error: "A table can only be owned by one pipeline. Concurrent pipeline operations such as maintenance and full refresh will conflict with each other."

So, do I need to create different pipelines for each developer, and also create different catalogs/schemas? Like sales_dev.peterson or create a catalog dev_peterson.sales? This seems a little ugly to me, but maybe I'm thinking with the wrong paradigm in mind. Since QA and Prod will have the right catalogs, maybe Dev is not so important in this organization. What do you think?

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @ismaelhenzel ,

Yep, that's true. But you can find quite interesting workaround/solution for this problem at below excellent blog post:

https://www.advancinganalytics.co.uk/blog/avoid-delta-live-table-conflicts-with-databricks-asset-bun...

View solution in original post

5 REPLIES 5

szymon_dybczak
Esteemed Contributor III

Hi @ismaelhenzel ,

Yep, that's true. But you can find quite interesting workaround/solution for this problem at below excellent blog post:

https://www.advancinganalytics.co.uk/blog/avoid-delta-live-table-conflicts-with-databricks-asset-bun...

That's an elegant solution; I can understand it better now. I developed my version slightly differently: instead of then set the schema for my DLT pipeline. I used the configuration variable DLT_GOLD_SCHEMA for the target database, as shown below.

targets:

  dev:
    variables:
      ENVIRONMENT: 'dev'
      DLT_GOLD_SCHEMA: "gold_${workspace.current_user.short_name}"

    mode: development
    default: true

    workspace:
      host: XXXXX

  qa:
    variables:
      ENVIRONMENT: 'qa'
      DLT_GOLD_SCHEMA: "gold"

    mode: development

    workspace:
      host: XXXXX

    run_as:
      service_principal_name: ${var.DATABRICKS_SP_DEV}

  prod:
    variables:
      DLT_GOLD_SCHEMA: "gold"
      ENVIRONMENT: 'prod'

    mode: production

    workspace:
      host: xxxxxx
      root_path: "/Databricks/Template"

    run_as:
      service_principal_name: ${var.DATABRICKS_SP_PROD}

 

resources:
  pipelines:
    dlt-data-engineering:
      name: DE_Template

      libraries:
        - glob:
            include: ../../src/data_engineering/transformations/**

      schema: "${var.DLT_GOLD_SCHEMA}"
      catalog:  "mul_${var.ENVIRONMENT}_template_streaming"

 

 

szymon_dybczak
Esteemed Contributor III

Great job @ismaelhenzel ! Thanks for sharing with us 🙂

Poorva21
New Contributor

Yes—each developer should have their own DLT pipeline and their own schema. It’s the correct paradigm.
It keeps DLT ownership clean and prevents pipeline conflicts.
Dev naming doesn’t need to be pretty; QA/Prod are where structure matters.

Thanks for answering! That was very clarifying. I've never used it that way before. At first glance, it seems a little strange, but as you said, what matters is QA/Production.