Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...
Hi @John Fico​ ​, We haven’t heard from you since the last response from @Hubert Dudek​ and @Jose Gonzalez​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpf...
This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables. Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delt...
This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delta...
I am trying to utilize the Event Log DLT is keeping updated, I noticed some of the fields are consistently empty/null.In the Event Log, located ".../storage/system/events", I see the field "origin" and there are nested fields within which are empty/n...
Hi @Kristian Foster​,The following docs will provide more details on the event log schema. Please refer to this link https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#monitor-pipelines-with-the-delta-live-tables...
REST API Documentation is out of date since the release of Delta Live TablesWhen using the `2.0/clusters/list` endpoint in an environment with running clusters provisioned by DLTs, the clusters will be returned with a `cluster_source` value of `PIPEL...
Hi @Sam Steere​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
I have a delta live table that I'm trying to run GroupBy on, but getting an error: "RuntimeError: Query function must return either a Spark or Koalas DataFrame". Here is my code:@dlt.table
def groups_hierarchy():
df = dlt.read_stream("groups_h...
Hi @Preben Olsen​ Does @Debayan Mukherjee​ response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!
I have a delta live tables pipeline that is loading and transforming data. Currently I am having a problem that the schema inferred by DLT does not match the actual schema of the table. The table is generated via a groupby.pivot operation as follows:...
ObjectiveWithin the context of a delta live table, I'm trying to merge two streams aggregation, but run into challenges. Is it possible to achieve such a join?ContextAssume- table trades stores a list of trades with their associated time stamps- tabl...
What was the established architectural pattern for doing streaming ETL with Delta Lake before DLT was a thing? And incidentally, what approach would you take in the context of delta-oss today? The pipeline definitions would not have had to be declara...
Hi @Veli-Jussi Raitila​ Does @Shanmugavel Chandrakasu​ response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!
We are building a DLT pipeline and the autoloader is handling schema evolution fine. However, further down the pipeline we are trying to load that streamed data with the apply_changes() function into a new table and, from the looks of it, doesn't see...
Hey there @Palani Thangaraj​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...
I have some data in silver that I read in as a view using the __apply_changes function on. I create a table based on this, and I then want to create my gold-table, after doing a .groupBy() and .pivot(). The transformations I do in the gold-table aren...
I have found a temporary solution to solve this. The .pivot("columnName") should automatically grab all the values it can find, but for some reason it does not. I need to specify the values, using.pivot("group_name", "group0", "group1", "group2"...) ...
As many of you, we have implemented a "medallion architecture" (raw/bronze/silver/gold layers), which are each stored on seperate storrage accounts. We only create proper hive tables of the gold layer tables, so our powerbi users connecting to the da...
I can answer the first question:You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct acc...
Hi,I have a Delta Live Tables pipeline, using Auto Loader, to ingest from JSON files. I need to do some transformations - in this case, converting timestamps. Except one of the timestamp columns does not exist in every file. This is causing the DLT p...
You’ve gotten familiar with Delta Live Tables (DLT) via the quickstart and getting started guide. Now it’s time to tackle creating a DLT data pipeline for your cloud storage–with one line of code. Here’s how it’ll look when you're starting:CREATE OR ...
Tip #3: Use JSON cluster configurations to access your storage locationKnowledge check: How do I modify DLT settings using JSON? Delta Live Tables settings are expressed as JSON and can be modified in the Delta Live Tables UI [AWS] [Azure][GCP].Examp...
Still relatively new to Spark and even more so to Delta Live Tables so apologies if I've missed something fundamental but here goes.We are trying to run a notebook via Delta Live Tables, which contains 2 functions decorated by the `dlt.table` decorat...
Hi @Karthik Munipalle​, Delta Live Tables queries can be implemented in Python or SQL.Here are few articles best explaining about DLT. Please have a look.https://docs.databricks.com/data-engineering/delta-live-tables/index.htmlhttps://databricks.com/...