Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am building out a new DLT pipeline and have since had to rebuild it from scratch. Having deleted the old pipeline and constructed a new one I now get this error:Table 'X' is already managed by pipeline 'Y'. As I only have the one pipeline how would...
rename your function from @Dlt.table, for exemple:@Dlt.table( comment="exemple", table_properties={"exemple": "exemple"}, partition_cols=["a", "b", "c"])def modify_this_name():
I understand that DLT is a separate job compute but I would like to use an existing all purpose cluster for the DLT pipeline. Is there a way I can achieve this?
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
Hi, there. I encountered an issue when I was trying to create my delta live table pipeline. The error is "DataPlaneException: Failed to launch pipeline cluster 1202-031220-urn0toj0: Could not launch cluster due to cloud provider failures. azure_error...
@Simon Xu​ I suspect that DLT is trying to grab some machine types that you simply have zero quota for in your Azure account. By default, below machine type gets requested behind the scenes for DLT:AWS: c5.2xlargeAzure: Standard_F8sGCP: e2-standard-8...
Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...
Hi @Sarah Guido​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
I need to process some transformation on incoming data as a batch and want to know if there is way to use foreachbatch option in deltalivetable. I am using autoloader to load json files and then I need to apply foreachbatch and store results into ano...
Not sure if this will apply to you or not...I was looking at the foreachbatch tool to reduce the workload of getting distinct data from a history table of 20million + records because the df.dropDuplicates() function was intermittently running out of ...
hi team,There used to be option to provide DLT pipeline settings either via UI or JSON, but I do not see it anymore after switching to new UI. Is this something expected ? am I missing something ? here is screenshot for reference.
HelloI'm developing a dlt pipeline, configured in continuous mode.I'm still in dev mode, so I stop my pipeline when i'm not working on it.My problem is that the pipeline is frequently started by SERVICE_UPGRADE.example of message:'Update xxxxx starte...
Hello,I have some nested columns with hyphen i.e. sample-1 in struct column, recently DLT pipeline has started throwing synatx error. Before May 24, 2023, this was working fine.Is this a new bug in May 2023 release?After clearing table and table's da...
Hi @Rishabh Tomar​ We haven't heard from you since the last response from @Kaniz Fatma​ . Kindly share the information with us, and in return, we will provide you with the necessary solution. Thanks and Regards
I am trying to setup delta live tables pipelines to ingest data to bronze and silver tables. Bronze and Silver are separate schema. This will be triggered by a daily job. It appears to run fine when set as continuous, but fails when triggered.Table...
Hi @Jennette Shepard​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...
Hi, I create a table using DLT pipeline (triggered once). In the ETL process, I add a new column to the table with Null values by:output = output.withColumn('Indicator_Latest_Value_Date', F.lit(None))Pipeline works and I don't get any error. But, whe...
I'm having an issue accessing the excel through dlt pipeline. the file is in ADLS I'm using pandas to read the Excel. It seems pandas are not able to understand abfss protocol is there any way to read Excel with pandas in dlt pipeline?I'm getting thi...
Could you please guide on how to create the DLT pipeline that directly reads the data from jdbc.When I created the DLT pipeline it give me error at Setting up table, If I ran interactively in notebooks it run successfully, but in non interactive mode...
What you try do to is not possible.dlt uses autoloader, not jdbcno jars (dlt is sql/python only)I'd skip DLT for this scenario and use an ordinary notebook, nothing wrong with that.
I am in a situation where I have a notebook that runs in a pipeline that creates a "live streaming table". So, I cannot use a language other than sql in the pipeline. I would like to format a certain column in the pipeline using a scala code (it's a ...
Hi all,I have a DLT pipeline as so:raw -> cleansed (SCD2) -> curated. 'Raw' is utilizing autoloader, to continously read file from a datalake. These files can contain tons of duplicate, which causes our raw table to become quite large. Therefore, we ...
Ok, i'll try an add additional details. Firstly: The diagram below shows our current dataflow: Our raw table is defined as such: TABLES = ['table1','table2']
def generate_tables(table_name):
@dlt.table(
name=f'raw_{table_name}',
table_pro...
Hi,Is it possible to set it up the RETRY_ON_FAILURE property for DLTs through the API?I'm not finding in the Docs (although it seems to exist in a response payload).https://docs.databricks.com/delta-live-tables/api-guide.html