Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am building out a new DLT pipeline and have since had to rebuild it from scratch. Having deleted the old pipeline and constructed a new one I now get this error:Table 'X' is already managed by pipeline 'Y'. As I only have the one pipeline how would...
rename your function from @Dlt.table, for exemple:@Dlt.table( comment="exemple", table_properties={"exemple": "exemple"}, partition_cols=["a", "b", "c"])def modify_this_name():
I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a parameter. I know you can have settings in the pipeline that you use in the DLT notebook, but it seems you can only assign values to them when crea...
How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...
What is the difference between Databricks Auto-Loader and Delta Live Tables? Both seem to manage ETL for you but I'm confused on where to use one vs. the other.
I understand that DLT is a separate job compute but I would like to use an existing all purpose cluster for the DLT pipeline. Is there a way I can achieve this?
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separat...
You can't. The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL cod...
I'll try to answer this in the simplest possible way
1. Spark is an imperative programming framework. You tell it what it to do, it does it. DLT is declarative - you describe what you want the datasets to be (i.e. the transforms), and it takes care ...
Hi,According to the latest release notes, the current channel of DLT should be using Databricks runtime 11.3 and the preview channel should be using 12.2. The current channel was using correct runtime version 11.3 still yesterday morning, but since ...
I'm seeing the same issue with 12 current / 13 preview. Updating the channel didn't bump the runtime version and even creating a pipeline with the preview channel uses the current version.
Dear Community, Hope you are doing well.For the last couple of days I am seeing very strange issues with my DLT pipeline, So every 60-70 mins it is getting failed in continuous mode, with the ERROR; INTERNAL_ERROR: Communication lost with driver. Clu...
Hello @Debayan , I am facing same issue, while running Delta live table, This job is running in produtcuion, but it's not working in dev, i have tried to increae the worker nodes but no use. Can you please help on this.
Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.
Hi @Phani1 ,
• Databricks recommends applying data quality rules on the bronze layer before proceeding to the silver and gold layer.• The recommended approach involves storing data quality rules in a Delta table.• The rules are categorized by a tag ...
Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...
Hi @Sarah Guido​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
Hello Everyone,I am trying to create Delta Live Table on top of csv file using below syntax:CREATE OR REFRESH LIVE TABLE employee_bronze_dltCOMMENT "The bronze employee dataset, ingested from /mnt/lakehouse/PoC/DLT/Source/."AS SELECT * FROM csv.`/mnt...
Hi @Manasi_Sarang ,I believe the Delta is unable to infer the schema as you are using select statement to read entire content from csv file and I think the inferschema won't work here. Instead you can try to create a temp live table or live view wit...
I need to process some transformation on incoming data as a batch and want to know if there is way to use foreachbatch option in deltalivetable. I am using autoloader to load json files and then I need to apply foreachbatch and store results into ano...
Not sure if this will apply to you or not...I was looking at the foreachbatch tool to reduce the workload of getting distinct data from a history table of 20million + records because the df.dropDuplicates() function was intermittently running out of ...
hi team,There used to be option to provide DLT pipeline settings either via UI or JSON, but I do not see it anymore after switching to new UI. Is this something expected ? am I missing something ? here is screenshot for reference.
Hi @Pavan Kumar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...