Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi, there. I encountered an issue when I was trying to create my delta live table pipeline. The error is "DataPlaneException: Failed to launch pipeline cluster 1202-031220-urn0toj0: Could not launch cluster due to cloud provider failures. azure_error...
you can create the pool instance in the databricks under compute/pool and assign the value in the json of the DLT pipeline. With this, we will control on pool min workers and max workers and the reuse of the pools available by other pipelines. "node_...
Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...
Hi @Sarah Guido Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
I am building out a new DLT pipeline and have since had to rebuild it from scratch. Having deleted the old pipeline and constructed a new one I now get this error:Table 'X' is already managed by pipeline 'Y'. As I only have the one pipeline how would...
rename your function from @Dlt.table, for exemple:@Dlt.table( comment="exemple", table_properties={"exemple": "exemple"}, partition_cols=["a", "b", "c"])def modify_this_name():
I need to process some transformation on incoming data as a batch and want to know if there is way to use foreachbatch option in deltalivetable. I am using autoloader to load json files and then I need to apply foreachbatch and store results into ano...
Not sure if this will apply to you or not...I was looking at the foreachbatch tool to reduce the workload of getting distinct data from a history table of 20million + records because the df.dropDuplicates() function was intermittently running out of ...
hi team,There used to be option to provide DLT pipeline settings either via UI or JSON, but I do not see it anymore after switching to new UI. Is this something expected ? am I missing something ? here is screenshot for reference.
HelloI'm developing a dlt pipeline, configured in continuous mode.I'm still in dev mode, so I stop my pipeline when i'm not working on it.My problem is that the pipeline is frequently started by SERVICE_UPGRADE.example of message:'Update xxxxx starte...
Hello,I have some nested columns with hyphen i.e. sample-1 in struct column, recently DLT pipeline has started throwing synatx error. Before May 24, 2023, this was working fine.Is this a new bug in May 2023 release?After clearing table and table's da...
Hi @Rishabh Tomar We haven't heard from you since the last response from @Kaniz Fatma . Kindly share the information with us, and in return, we will provide you with the necessary solution. Thanks and Regards
I am trying to setup delta live tables pipelines to ingest data to bronze and silver tables. Bronze and Silver are separate schema. This will be triggered by a daily job. It appears to run fine when set as continuous, but fails when triggered.Table...
Hi @Jennette Shepard Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...
Hi, I create a table using DLT pipeline (triggered once). In the ETL process, I add a new column to the table with Null values by:output = output.withColumn('Indicator_Latest_Value_Date', F.lit(None))Pipeline works and I don't get any error. But, whe...
I'm having an issue accessing the excel through dlt pipeline. the file is in ADLS I'm using pandas to read the Excel. It seems pandas are not able to understand abfss protocol is there any way to read Excel with pandas in dlt pipeline?I'm getting thi...
Could you please guide on how to create the DLT pipeline that directly reads the data from jdbc.When I created the DLT pipeline it give me error at Setting up table, If I ran interactively in notebooks it run successfully, but in non interactive mode...
What you try do to is not possible.dlt uses autoloader, not jdbcno jars (dlt is sql/python only)I'd skip DLT for this scenario and use an ordinary notebook, nothing wrong with that.
I am in a situation where I have a notebook that runs in a pipeline that creates a "live streaming table". So, I cannot use a language other than sql in the pipeline. I would like to format a certain column in the pipeline using a scala code (it's a ...
Hi all,I have a DLT pipeline as so:raw -> cleansed (SCD2) -> curated. 'Raw' is utilizing autoloader, to continously read file from a datalake. These files can contain tons of duplicate, which causes our raw table to become quite large. Therefore, we ...
Ok, i'll try an add additional details. Firstly: The diagram below shows our current dataflow: Our raw table is defined as such: TABLES = ['table1','table2']
def generate_tables(table_name):
@dlt.table(
name=f'raw_{table_name}',
table_pro...
Hi,Is it possible to set it up the RETRY_ON_FAILURE property for DLTs through the API?I'm not finding in the Docs (although it seems to exist in a response payload).https://docs.databricks.com/delta-live-tables/api-guide.html
Dear support,we have the following situation where a set of DLT pipelines are streaming with very low rate incoming data and we need to find the root cause of this delay.In order to provide more insight about the setup of the DLT pipelines and some m...
Hi @EDDatabricks EDDatabricks Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that ...