Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have a delta live table where I am reading cdc data and merging this data in silver using apply changes. In silver can I find out what all data has changed since the last run similar to change data feed table_changes?
I also have a requirment where i write to a live table (materialized view) and have cdf enabled i want to see the changes but here to i see overwrites happening after dlt pipeline runs
Hi,I was wondering that what are differences between Materialized view and Streaming table? which one should I use when I extract data from bronze table to silver table since I found that both CREATE LIVE TABLE and CREATE STREAMING LIVE TABLE could a...
Hi @Mike Chen Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wi...
I'm trying to setup autoloader to read some csv files. I tried with both autoloader with the DLT decorator as well as just autoloader by itself. The first column of the data is called "run_id", when I do a spark.read.csv() directly on the file it com...
I have been trying to work on implementing delta live tables to a pre-existing workflow. Currently trying to create two tables: appointments_raw and notes_raw, where notes_raw is "downstream" of appointments_raw. Following this as a reference, I'm at...
A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...
As the title suggests, whenever I create a streaming live table it creates a __apply_changes_storage_"mytablename" section in the database on databricks. Is there a way to specify a different cloud location for these files?
Hi @Logan Nicol Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks
I have a delta live table that I'm trying to run GroupBy on, but getting an error: "RuntimeError: Query function must return either a Spark or Koalas DataFrame". Here is my code:@dlt.table
def groups_hierarchy():
df = dlt.read_stream("groups_h...
Hi @Preben Olsen Does @Debayan Mukherjee response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!
Hello, @Paresh J! Welcome and thank you for asking! My name is Piper, and I'm a moderator for Databricks.Let's give the community some time to help before we circle back to you. Thanks in advance for your patience.
The target key, when creating the pipeline specifies the database that the tables get published to. Documented here - https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#publish-tables