cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

isaac_gritz
by Valued Contributor II
  • 7295 Views
  • 1 replies
  • 2 kudos

Change Data Capture with Databricks

How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...

  • 7295 Views
  • 1 replies
  • 2 kudos
Latest Reply
prasad95
New Contributor III
  • 2 kudos

Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,

  • 2 kudos
Louis_Databrick
by New Contributor II
  • 1020 Views
  • 2 replies
  • 0 kudos

Registering a dataframe coming from a CDC data stream removes the CDC columns from the resulting temporary view, even when explicitly adding a copy of the column to the dataframe.

df_source_records.filter(F.col("_change_type").isin("delete", "insert", "update_postimage")) .withColumn("ROW_NUMBER", F.row_number().over(window)) .filter("ROW_NUMBE...

  • 1020 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Databrick
New Contributor II
  • 0 kudos

Seems to work now actually. No idea what changed, as I tried multiple times exactly in this way and it did.not.work.from pyspark.sql.functions import expr from pyspark.sql.utils import AnalysisException import pyspark.sql.functions as f     data = [(...

  • 0 kudos
1 More Replies
ravinchi
by New Contributor III
  • 2794 Views
  • 5 replies
  • 9 kudos

I'd like to ingest data into my ADLS from sql server in an incremental manner using Delta Live Tables.

I'd like to ingest data into my ADLS from sql server in an incremental manner using Delta Live Tables. I do not want to use any staging tables. I was using CDC, While I call dlt.apply_changes, its asking me to specify source and target. SInce source ...

  • 2794 Views
  • 5 replies
  • 9 kudos
Latest Reply
Sandeep
Contributor III
  • 9 kudos

If you have a CDC feed, looks like we can use this: https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cdc.html

  • 9 kudos
4 More Replies
Jennifer
by New Contributor III
  • 1904 Views
  • 1 replies
  • 0 kudos

How do I update an aggregate table using a Delta live table

I have am using delta live tables to stream events and I have a raw table for all the events and a downstream aggregate table. I need to add the new aggregated number to the downstream table aggregate column. But I didn't find any recipe talking abou...

  • 1904 Views
  • 1 replies
  • 0 kudos
Latest Reply
Jennifer
New Contributor III
  • 0 kudos

Maybe my code is correct already since I use dlt.read("my_raw_table") instead of delta.read_stream("my_raw_table"). So the col_aggr is recalculated completely every time my_raw_table is updated.

  • 0 kudos
alxsbn
by New Contributor III
  • 1703 Views
  • 2 replies
  • 2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

  • 1703 Views
  • 2 replies
  • 2 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

  • 2 kudos
1 More Replies
Jennifer_Lu
by New Contributor III
  • 935 Views
  • 1 replies
  • 3 kudos

Why does DLT CDC some time manifests the results table as a table and other times as a view?

I have a simple DLT pipeline that reads from an existing table, do some transformations, saves to a view, and then uses dlt.apply_changes() to insert the view into a results table. My question is:why is my results table a view and not a table like I ...

  • 935 Views
  • 1 replies
  • 3 kudos
Latest Reply
Jfoxyyc
Valued Contributor
  • 3 kudos

I find most of my apply_changes tables are being created as materialized views as well. They do recalculate at runtime, so they're up to date and behave a lot like a table, but they aren't tables in the same sense.

  • 3 kudos
Trodenn
by New Contributor III
  • 5197 Views
  • 4 replies
  • 1 kudos

How to merge two separate DELTA LIVE TABLE?

So I have two delta live tables. One that is the master table that contains all the prior data, and another table that contains all the new data for that specific day. I want to be able to merge those two table so that the master table contains would...

  • 5197 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

@Rishabh Pandey​ 

  • 1 kudos
3 More Replies
brickster_2018
by Esteemed Contributor
  • 1554 Views
  • 3 replies
  • 0 kudos

Resolved! For the Autoloader, cloudFiles.includeExistingFiles option, is ordering respected?

If Yes, how is order ensured?  For example, let's say there are a number of CDC change files that are uploaded to a directory over time. If a table were to be created using the cloudFiles source, in what order would those files be processed?

  • 1554 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hanish_Goel
New Contributor II
  • 0 kudos

Hi, Is there any new development in terms of ensuring ordering of the files in autoloader?

  • 0 kudos
2 More Replies
dragonH
by New Contributor
  • 999 Views
  • 0 replies
  • 0 kudos

The CDC Logs from AWS DMS not apply correctly

I have a dms task that processing the full-load and replication ongoing tasksfrom source (MSSQL) to target (AWS S3)then use delta lake to handle the CDC logsI've a notebook that would insert data into mssql continuously (with id as primary key)then d...

204293406-01bf6cc1-bb6f-42bb-9bfe-e9b1f5135ae9[1]
  • 999 Views
  • 0 replies
  • 0 kudos
pmt
by New Contributor III
  • 2409 Views
  • 7 replies
  • 1 kudos

Handling Changing Schema in CDC DLT

We are building a DLT pipeline and the autoloader is handling schema evolution fine. However, further down the pipeline we are trying to load that streamed data with the apply_changes() function into a new table and, from the looks of it, doesn't see...

  • 2409 Views
  • 7 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hey there @Palani Thangaraj​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...

  • 1 kudos
6 More Replies
Zair
by New Contributor III
  • 1288 Views
  • 2 replies
  • 2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

  • 1288 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Valued Contributor III
  • 2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

  • 2 kudos
1 More Replies
BradSheridan
by Valued Contributor
  • 2296 Views
  • 4 replies
  • 0 kudos

CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'

Hey there Community!! I'm using dlt.apply_changes in my DLT job as follows:dlt.apply_changes( target = "employee_silver",  source = "employee_bronze_clean_v",  keys = ["EMPLOYEE_ID"],  sequence_by = col("last_updated"),  apply_as_deletes = expr("Op ...

  • 2296 Views
  • 4 replies
  • 0 kudos
Latest Reply
axb0
New Contributor III
  • 0 kudos

First try expr("Operation = 'DELETE'") for your apply_as_deletes

  • 0 kudos
3 More Replies
zesdatascience
by New Contributor III
  • 3077 Views
  • 7 replies
  • 2 kudos

Resolved! Delta Live Tables with CDC and Database Views with Lower Case Names

Hi,I am testing out creating some Delta Live Tables using Change Data Capture and having an issue where the resulting views that are created have lower case column names. Here is my function I am using to ingest data:def raw_to_ods_merge(table_name,s...

  • 3077 Views
  • 7 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Stuart Fish​ ​, I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

  • 2 kudos
6 More Replies
palzor
by New Contributor III
  • 7748 Views
  • 5 replies
  • 4 kudos

Getting error when using CDC in delta live table

Hi,I am trying to use CDC for delta live table, and when when I run the pipeline second time I get an error :org.apache.spark.sql.streaming.StreamingQueryException: Query tbl_cdc [id = ***-xx-xx-bf7e-6cb8b0deb690, runId = ***-xxxx-4031-ba74-b4b22be05...

  • 7748 Views
  • 5 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @Palzor Lama​,A streaming live table can only process append queries; that is, queries where new rows are inserted into the source table. Processing updates from source tables, for example, merges and deletes, is not supported. To process updates,...

  • 4 kudos
4 More Replies
Labels