cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kskistad
by New Contributor III
  • 3073 Views
  • 3 replies
  • 4 kudos

Resolved! Streaming Delta Live Tables

I'm a little confused about how streaming works with DLT. My first questions is what is the difference in behavior if you set the pipeline mode to "Continuous" but in your notebook you don't use the "streaming" prefix on table statements, and simila...

  • 3073 Views
  • 3 replies
  • 4 kudos
Latest Reply
Harsh141220
New Contributor II
  • 4 kudos

Is it possible to have custom upserts in streaming tables in a delta live tables pipeline?Use case: I am trying to maintain a valid session based on timestamp column and want to upsert to the target table.Tried going through the documentations but dl...

  • 4 kudos
2 More Replies
PearceR
by New Contributor III
  • 4802 Views
  • 3 replies
  • 1 kudos

Resolved! custom upsert for delta live tables apply_changes()

Hello community :).I am currently implementing some pipelines using DLT. They are working great for my medalion architecture for landed json in bronze -> silver (using apply_changes) then materialized gold views ontop.However, I am attempting to crea...

  • 4802 Views
  • 3 replies
  • 1 kudos
Latest Reply
Harsh141220
New Contributor II
  • 1 kudos

Is it possible to have custom upserts for streaming tables in delta live tables?Im getting the error:pyspark.errors.exceptions.captured.AnalysisException: `blusmart_poc.information_schema.sessions` is not a Delta table.

  • 1 kudos
2 More Replies
labromb
by Contributor
  • 5792 Views
  • 8 replies
  • 4 kudos

How to pass configuration values to a Delta Live Tables job through the Delta Live Tables API

Hi Community,I have successfully run a job through the API but would need to be able to pass parameters (configuration) to the DLT workflow via the APII have tried passing JSON in this format:{ "full_refresh": "true", "configuration": [ ...

  • 5792 Views
  • 8 replies
  • 4 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 4 kudos

@Mo - it worked. Thank you so much.

  • 4 kudos
7 More Replies
Phani1
by Valued Contributor
  • 4338 Views
  • 7 replies
  • 8 kudos

Delta Live Table name dynamically

Hi Team,Can we pass Delta Live Table name dynamically [from a configuration file, instead of hardcoding the table name]? We would like to build a metadata-driven pipeline.

  • 4338 Views
  • 7 replies
  • 8 kudos
Latest Reply
Azure_dbks_eng
New Contributor II
  • 8 kudos

I am observing same error while I adding dataset.tablename. org.apache.spark.sql.catalyst.ExtendedAnalysisException: Materializing tables in custom schemas is not supported. Please remove the database qualifier from table 'streaming.dlt_read_test_fil...

  • 8 kudos
6 More Replies
isaac_gritz
by Valued Contributor II
  • 1521 Views
  • 1 replies
  • 2 kudos

Change Data Capture with Databricks

How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...

  • 1521 Views
  • 1 replies
  • 2 kudos
Latest Reply
prasad95
New Contributor III
  • 2 kudos

Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,

  • 2 kudos
tinai_long
by New Contributor III
  • 5661 Views
  • 10 replies
  • 4 kudos

Resolved! How to refresh a single table in Delta Live Tables?

Suppose I have a Delta Live Tables framework with 2 tables: Table 1 ingests from a json source, Table 2 reads from Table 1 and runs some transformation.In other words, the data flow is json source -> Table 1 -> Table 2. Now if I find some bugs in the...

  • 5661 Views
  • 10 replies
  • 4 kudos
Latest Reply
cpayne_vax
New Contributor III
  • 4 kudos

Answering my own question: nowadays (February 2024) this can all be done via the UI.When viewing your DLT pipeline there is a "Select tables for refresh" button in the header. If you click this, you can select individual tables, and then in the botto...

  • 4 kudos
9 More Replies
User16826992185
by New Contributor II
  • 6376 Views
  • 2 replies
  • 3 kudos

Databricks Auto-Loader vs. Delta Live Tables

What is the difference between Databricks Auto-Loader and Delta Live Tables? Both seem to manage ETL for you but I'm confused on where to use one vs. the other.

  • 6376 Views
  • 2 replies
  • 3 kudos
Latest Reply
SteveL
New Contributor II
  • 3 kudos

You say "...__would__ be a piece..." and "...DLT __would__ pick up...".Is DLT built upon AL?

  • 3 kudos
1 More Replies
NathanSundarara
by Contributor
  • 4225 Views
  • 7 replies
  • 2 kudos

Delta live table generate unique integer value (kind of surrogate key) for combination of columns

Hi,we are in process of moving our Datawarehouse from sql server to databricks. we are in process of testing our Dimension Product table which has identity column for referencing in fact table as surrogate key. In Databricks Apply changes SCD type 2 ...

  • 4225 Views
  • 7 replies
  • 2 kudos
Latest Reply
ilarsen
Contributor
  • 2 kudos

Hey.  Yep, xxhash64 (or even just hash) generate numerical values for you.  Combine with abs function to ensure the value is positive.  In our team we used abs(hash()) ourselves... for maybe a day.  Very quickly I observed a collision, and the data s...

  • 2 kudos
6 More Replies
sarguido
by New Contributor II
  • 1848 Views
  • 4 replies
  • 2 kudos

Delta Live Tables: bulk import of historical data?

Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from...

  • 1848 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sarah Guido​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
3 More Replies
Enzo_Bahrami
by New Contributor III
  • 3639 Views
  • 6 replies
  • 1 kudos

Resolved! On-Premise SQL Server Ingestion to Databricks Bronze Layer

Hello everyone!So I want to ingest tables with schemas from the on-premise SQL server to Databricks Bronze layer with Delta Live Table and I want to do it using Azure Data Factory and I want the load to be a Snapshot batch load, not an incremental lo...

  • 3639 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Parsa Bahraminejad​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 1 kudos
5 More Replies
Ryan_Chynoweth
by Honored Contributor III
  • 1253 Views
  • 2 replies
  • 2 kudos

medium.com

Hi All, I recently published a streaming data comparison between Snowflake and Databricks. Hope you enjoy! Please let me know what you think! https://medium.com/@24chynoweth/data-streaming-at-scale-databricks-and-snowflake-ca65a2401649

  • 1253 Views
  • 2 replies
  • 2 kudos
Latest Reply
babyhari
New Contributor II
  • 2 kudos

Nicely done. 

  • 2 kudos
1 More Replies
charlieyou
by New Contributor
  • 1295 Views
  • 1 replies
  • 0 kudos

StreamingQueryException: Read timed out // Reading from delta share'd dataset

I have a workspace in GCP that's reading from a delta-shared dataset hosted in S3. When trying to run a very basic DLT pipeline, I'm getting the below error. Any help would be awesome!Code:import dlt     @dlt.table def fn(): return (spark.readStr...

  • 1295 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Charlie You​ :The error message you're encountering suggests a timeout issue when reading from the Delta-shared dataset hosted in S3. There are a few potential reasons and solutions you can explore:Network connectivity: Verify that the network conne...

  • 0 kudos
Eelke
by New Contributor II
  • 1769 Views
  • 3 replies
  • 0 kudos

I want to perform interpolation on a streaming table in delta live tables.

I have the following code:from pyspark.sql.functions import * !pip install dbl-tempo from tempo import TSDF   from pyspark.sql.functions import *   # interpolate target_cols column linearly for tsdf dataframe def interpolate_tsdf(tsdf_data, target_c...

  • 1769 Views
  • 3 replies
  • 0 kudos
Latest Reply
Eelke
New Contributor II
  • 0 kudos

The issue was not resolved because we were trying to use a streaming table within TSDF which does not work.

  • 0 kudos
2 More Replies
Pras1
by New Contributor II
  • 4484 Views
  • 2 replies
  • 2 kudos

Resolved! AZURE_QUOTA_EXCEEDED_EXCEPTION - even with more than vCPUs than Databricks recommends

I am running this Delta Live Tables PoC from databricks-industry-solutions/industry-solutions-blueprintshttps://github.com/databricks-industry-solutions/pos-dltI have Standard_DS4_v2 with 28GB and 8 cores x 2 workers - so a total of 16 cores. This is...

  • 4484 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Prasenjit Biswas​ We haven't heard from you since the last response from @Jose Gonzalez​ ​ . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 2 kudos
1 More Replies
pablociu
by New Contributor
  • 769 Views
  • 2 replies
  • 0 kudos

How to define write Option in a DLT using Python?

In a normal notebook I would save metadata to my Delta table using the following code:( df.write .format("delta") .mode("overwrite") .option("userMetadata", user_meta_data) .saveAsTable("my_table") )But I couldn't find online how c...

  • 769 Views
  • 2 replies
  • 0 kudos
Latest Reply
United_Communit
New Contributor II
  • 0 kudos

In Delta lab you can set up User MetaData so i will give you some tips from delta import DeltaTable# Create or load your Delta tabledelta_table = DeltaTable.forPath(spark, "path_to_delta_table")# Define your user metadata myccpayuser_meta_data = {"ke...

  • 0 kudos
1 More Replies
Labels