- 353 Views
- 0 replies
- 0 kudos
I'm utilizing SQL to perform aggregation operations within a gold layer of a DLT pipeline. However, I'm encountering an error when running the pipeline while attempting to return a data frame using spark.sql.Could anyone please assist me with the SQL...
Please explain with some use cases which show the difference between DLT and dbt.
Hi @Prachi Sankhala​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...
Hi all!We are using DLT for our ETL jobs, and we're noticing the setup steps (Initializing, Resetting tables. Setting up tables, Rendering graph) are taking much longer than actually ETL'ing the data into our tables. We have about 110 tables combined...
Hi @daan duppen​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...
I am using DLT to load csv in ADLS, below is my sql query in notebook:CREATE OR REFRESH STREAMING LIVE TABLE test_account_raw AS SELECT * FROM cloud_files( "abfss://my_container@my_storageaccount.dfs.core.windows.net/test_csv/", "csv", map("h...
thank you every one, the problem is resolved, problem is gone when I have workspace admin access.
Hi allI have a table created by DLT. Initially I specified cloudFiles.inferColumnTypes to false and all columns are stored as strings. However, I now want to use cloudFiles.inferColumnTypes=true. I dropped the table and re-ran the pipeline, which fai...
Hi @Billy Wong​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...
We have a DLT pipeline that uses the autoloader to detect files added to a source storage bucket. It reads these updated files and adds new records to a bronze streaming table. However we would also like to automatically delete records from the bronz...
@Bennett Lambert​ :Yes, it is possible to automatically delete records from the bronze table when a source file is deleted, without doing a full refresh. One way to achieve this is by using the Change Data Capture (CDC) feature in Databricks Delta.CD...
Hello community :).I am currently implementing some pipelines using DLT. They are working great for my medalion architecture for landed json in bronze -> silver (using apply_changes) then materialized gold views ontop.However, I am attempting to crea...
@Robert Pearce​ :It is possible to achieve the desired behavior using apply_changes in Databricks Delta Lake. You can use the merge operation to merge data from your source into your target Delta table, and then use whenMatchedUpdate to update the id...
Hi there, Can you use a %run or dbutils.notebook.run() in a Delta Live Table (DLT) pipeline?When I try, I get the following error: "IllegalArgumentException: requirement failed: To enable notebook workflows, please upgrade your Databricks subscriptio...
Hi all.@Kaniz Fatma​ thanks for your answer. I am on the premium pricing tier in Azure.After digging around the logs it would seem that you cannot run magic commands in a Delta Live Table pipeline. Therefore, you cannot use %run in a DLT pipeline - w...
I'm trying to setup autoloader to read some csv files. I tried with both autoloader with the DLT decorator as well as just autoloader by itself. The first column of the data is called "run_id", when I do a spark.read.csv() directly on the file it com...
can you attach the exact output so that I can have a look on that .
Does anyone have a workflow or pattern that works for developing with autoloader/DLT? I'm still new to but the fact that while testing it's creating checkpoints using schema locations makes it really tricky to develop with and hammer out a working ve...
what basically you are refering here as pattern .
Dear experts, Might I know what will happen to the delta live table pipeline which is in a cancelled state, when there is a runtime service upgrade? Thanks!
@KS LAU​ :When a runtime service upgrade occurs in Databricks, any running tasks or pipelines may be temporarily interrupted while the upgrade is being applied. In the case of a cancelled Delta Live Table pipeline, it will not be impacted by the upgr...
Hi,I would like to know if it is possible to get the target schema, programmatically, inside a DLT.In DLT pipeline settings, destination, target schema.I want to run more idempotent pipelines. For example, my target table has the fields: reference_da...
Thank you @Suteja Kanuri​ ,Looks like you solution is working, thank you.Regards,
I wanna apply a pivot on a dataframe in DLT but I'm having the following warningNotebook:XXXX used `GroupedData.pivot` function that will be deprecated soon. Please fix the notebook.I have the same warning if I use the the function collect.Is it risk...
Thanks @Kaniz Fatma​ for your support.The solution was to do the pivot outside of views or tables and the warning disappeared.
Hi all,I have a DLT pipeline as so:raw -> cleansed (SCD2) -> curated. 'Raw' is utilizing autoloader, to continously read file from a datalake. These files can contain tons of duplicate, which causes our raw table to become quite large. Therefore, we ...
Ok, i'll try an add additional details. Firstly: The diagram below shows our current dataflow: Our raw table is defined as such: TABLES = ['table1','table2'] def generate_tables(table_name): @dlt.table( name=f'raw_{table_name}', table_pro...