Friday
I would like to know the best practice for collaborating on a Delta Live Tables pipeline. I was thinking that each developer should have their own DLT pipeline in the development workspace. Currently, each domain has its development catalog, like sales_dev.gold. The problem is that with DLT, I can't have two pipelines running in the same schema with the same table names. You get the error: "A table can only be owned by one pipeline. Concurrent pipeline operations such as maintenance and full refresh will conflict with each other."
So, do I need to create different pipelines for each developer, and also create different catalogs/schemas? Like sales_dev.peterson or create a catalog dev_peterson.sales? This seems a little ugly to me, but maybe I'm thinking with the wrong paradigm in mind. Since QA and Prod will have the right catalogs, maybe Dev is not so important in this organization. What do you think?
Friday
Hi @ismaelhenzel ,
Yep, that's true. But you can find quite interesting workaround/solution for this problem at below excellent blog post:
Friday
Hi @ismaelhenzel ,
Yep, that's true. But you can find quite interesting workaround/solution for this problem at below excellent blog post:
17 hours ago
That's an elegant solution; I can understand it better now. I developed my version slightly differently: instead of then set the schema for my DLT pipeline. I used the configuration variable DLT_GOLD_SCHEMA for the target database, as shown below.
targets:
dev:
variables:
ENVIRONMENT: 'dev'
DLT_GOLD_SCHEMA: "gold_${workspace.current_user.short_name}"
mode: development
default: true
workspace:
host: XXXXX
qa:
variables:
ENVIRONMENT: 'qa'
DLT_GOLD_SCHEMA: "gold"
mode: development
workspace:
host: XXXXX
run_as:
service_principal_name: ${var.DATABRICKS_SP_DEV}
prod:
variables:
DLT_GOLD_SCHEMA: "gold"
ENVIRONMENT: 'prod'
mode: production
workspace:
host: xxxxxx
root_path: "/Databricks/Template"
run_as:
service_principal_name: ${var.DATABRICKS_SP_PROD}
resources:
pipelines:
dlt-data-engineering:
name: DE_Template
libraries:
- glob:
include: ../../src/data_engineering/transformations/**
schema: "${var.DLT_GOLD_SCHEMA}"
catalog: "mul_${var.ENVIRONMENT}_template_streaming"
17 hours ago
Great job @ismaelhenzel ! Thanks for sharing with us 🙂
Friday
Yes—each developer should have their own DLT pipeline and their own schema. It’s the correct paradigm.
It keeps DLT ownership clean and prevents pipeline conflicts.
Dev naming doesn’t need to be pretty; QA/Prod are where structure matters.
17 hours ago
Thanks for answering! That was very clarifying. I've never used it that way before. At first glance, it seems a little strange, but as you said, what matters is QA/Production.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now