Data Engineering

Forum Posts

Sorted by:

by isaac_gritz • Valued Contributor II

08-23-2022 12:10:35 AM

7714 Views
1 replies
2 kudos

Change Data Capture with Databricks

How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...

Data Engineering

7714 Views
1 replies
2 kudos

08-23-2022 12:10:35 AM

View Replies

Latest Reply

prasad95
New Contributor III

02-12-2024 9:29:46 AM

2 kudos

Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,

2 kudos

02-12-2024 9:29:46 AM

by Louis_Databrick • New Contributor II

05-31-2023 12:12:38 AM

1249 Views
2 replies
0 kudos

Registering a dataframe coming from a CDC data stream removes the CDC columns from the resulting temporary view, even when explicitly adding a copy of the column to the dataframe.

df_source_records.filter(F.col("_change_type").isin("delete", "insert", "update_postimage")) .withColumn("ROW_NUMBER", F.row_number().over(window)) .filter("ROW_NUMBE...

Data Engineering

1249 Views
2 replies
0 kudos

05-31-2023 12:12:38 AM

View Replies

Latest Reply

Louis_Databrick
New Contributor II

06-08-2023 4:15:24 AM

0 kudos

Seems to work now actually. No idea what changed, as I tried multiple times exactly in this way and it did.not.work.from pyspark.sql.functions import expr from pyspark.sql.utils import AnalysisException import pyspark.sql.functions as f data = [(...

0 kudos

06-08-2023 4:15:24 AM

1 More Replies

by ravinchi • New Contributor III

12-01-2022 6:07:12 AM

3281 Views
5 replies
9 kudos

I'd like to ingest data into my ADLS from sql server in an incremental manner using Delta Live Tables.

I'd like to ingest data into my ADLS from sql server in an incremental manner using Delta Live Tables. I do not want to use any staging tables. I was using CDC, While I call dlt.apply_changes, its asking me to specify source and target. SInce source ...

Data Engineering

3281 Views
5 replies
9 kudos

12-01-2022 6:07:12 AM

View Replies

Latest Reply

Sandeep
Contributor III

01-26-2023 7:59:49 AM

9 kudos

If you have a CDC feed, looks like we can use this: https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cdc.html

9 kudos

01-26-2023 7:59:49 AM

4 More Replies

by Jennifer • New Contributor III

01-16-2023 3:37:43 AM

2239 Views
1 replies
0 kudos

How do I update an aggregate table using a Delta live table

I have am using delta live tables to stream events and I have a raw table for all the events and a downstream aggregate table. I need to add the new aggregated number to the downstream table aggregate column. But I didn't find any recipe talking abou...

Data Engineering

2239 Views
1 replies
0 kudos

01-16-2023 3:37:43 AM

View Replies

Latest Reply

Jennifer
New Contributor III

01-16-2023 3:57:33 AM

0 kudos

Maybe my code is correct already since I use dlt.read("my_raw_table") instead of delta.read_stream("my_raw_table"). So the col_aggr is recalculated completely every time my_raw_table is updated.

0 kudos

01-16-2023 3:57:33 AM

by alxsbn • New Contributor III

01-11-2023 2:40:56 AM

2117 Views
2 replies
2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

Data Engineering

2117 Views
2 replies
2 kudos

01-11-2023 2:40:56 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-11-2023 3:43:05 AM

2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

2 kudos

01-11-2023 3:43:05 AM

1 More Replies

by Jennifer_Lu • New Contributor III

12-21-2022 11:37:30 AM

1134 Views
1 replies
3 kudos

Why does DLT CDC some time manifests the results table as a table and other times as a view?

I have a simple DLT pipeline that reads from an existing table, do some transformations, saves to a view, and then uses dlt.apply_changes() to insert the view into a results table. My question is:why is my results table a view and not a table like I ...

Data Engineering

1134 Views
1 replies
3 kudos

12-21-2022 11:37:30 AM

View Replies

Latest Reply

Jfoxyyc
Valued Contributor

12-28-2022 11:36:13 PM

3 kudos

I find most of my apply_changes tables are being created as materialized views as well. They do recalculate at runtime, so they're up to date and behave a lot like a table, but they aren't tables in the same sense.

3 kudos

12-28-2022 11:36:13 PM

by Trodenn • New Contributor III

12-21-2022 3:01:01 PM

6111 Views
4 replies
1 kudos

How to merge two separate DELTA LIVE TABLE?

So I have two delta live tables. One that is the master table that contains all the prior data, and another table that contains all the new data for that specific day. I want to be able to merge those two table so that the master table contains would...

Data Engineering

6111 Views
4 replies
1 kudos

12-21-2022 3:01:01 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-21-2022 7:55:02 PM

1 kudos

@Rishabh Pandey

1 kudos

12-21-2022 7:55:02 PM

3 More Replies

by brickster_2018 • Esteemed Contributor

06-25-2021 6:14:28 AM

1851 Views
3 replies
0 kudos

Resolved! For the Autoloader, cloudFiles.includeExistingFiles option, is ordering respected?

If Yes, how is order ensured? For example, let's say there are a number of CDC change files that are uploaded to a directory over time. If a table were to be created using the cloudFiles source, in what order would those files be processed?

Data Engineering

1851 Views
3 replies
0 kudos

06-25-2021 6:14:28 AM

View Replies

Latest Reply

Hanish_Goel
New Contributor II

07-21-2022 8:32:04 AM

0 kudos

Hi, Is there any new development in terms of ensuring ordering of the files in autoloader?

0 kudos

07-21-2022 8:32:04 AM

2 More Replies

by Anonymous • Not applicable

11-28-2022 6:54:42 PM

1201 Views
0 replies
0 kudos

The CDC Logs from AWS DMS not apply correctly

I have a dms task that processing the full-load and replication ongoing tasksfrom source (MSSQL) to target (AWS S3)then use delta lake to handle the CDC logsI've a notebook that would insert data into mssql continuously (with id as primary key)then d...

204293406-01bf6cc1-bb6f-42bb-9bfe-e9b1f5135ae9[1]

Data Engineering

1201 Views
0 replies
0 kudos

11-28-2022 6:54:42 PM

by 109005 • New Contributor III

09-02-2022 12:07:01 AM

1398 Views
2 replies
0 kudos

Does Databricks support CDC for BigQuery?

Data Engineering

1398 Views
2 replies
0 kudos

09-02-2022 12:07:01 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

09-02-2022 12:13:11 AM

0 kudos

You can do CDC via Delta Live Tables in GCP.https://docs.gcp.databricks.com/workflows/delta-live-tables/delta-live-tables-cdc.html

0 kudos

09-02-2022 12:13:11 AM

1 More Replies

by pmt • New Contributor III

07-28-2022 11:44:47 AM

2948 Views
7 replies
1 kudos

Handling Changing Schema in CDC DLT

We are building a DLT pipeline and the autoloader is handling schema evolution fine. However, further down the pipeline we are trying to load that streamed data with the apply_changes() function into a new table and, from the looks of it, doesn't see...

Data Engineering

2948 Views
7 replies
1 kudos

07-28-2022 11:44:47 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-06-2022 4:48:19 AM

1 kudos

Hey there @Palani Thangaraj Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...

1 kudos

09-06-2022 4:48:19 AM

6 More Replies

by Zair • New Contributor III

08-06-2022 2:15:58 PM

1583 Views
2 replies
2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

Data Engineering

1583 Views
2 replies
2 kudos

08-06-2022 2:15:58 PM

View Replies

Latest Reply

artsheiko
Honored Contributor

08-07-2022 6:10:46 AM

2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

2 kudos

08-07-2022 6:10:46 AM

1 More Replies

by BradSheridan • Valued Contributor

07-17-2022 6:53:46 AM

2740 Views
4 replies
0 kudos

CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'

Hey there Community!! I'm using dlt.apply_changes in my DLT job as follows:dlt.apply_changes( target = "employee_silver", source = "employee_bronze_clean_v", keys = ["EMPLOYEE_ID"], sequence_by = col("last_updated"), apply_as_deletes = expr("Op ...

Data Engineering

2740 Views
4 replies
0 kudos

07-17-2022 6:53:46 AM

View Replies

Latest Reply

axb0
New Contributor III

07-17-2022 8:09:59 AM

0 kudos

First try expr("Operation = 'DELETE'") for your apply_as_deletes

0 kudos

07-17-2022 8:09:59 AM

3 More Replies

by zesdatascience • New Contributor III

05-03-2022 4:48:09 PM

3643 Views
7 replies
2 kudos

Resolved! Delta Live Tables with CDC and Database Views with Lower Case Names

Hi,I am testing out creating some Delta Live Tables using Change Data Capture and having an issue where the resulting views that are created have lower case column names. Here is my function I am using to ingest data:def raw_to_ods_merge(table_name,s...

Data Engineering

3643 Views
7 replies
2 kudos

05-03-2022 4:48:09 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

06-14-2022 8:49:32 AM

2 kudos

Hi @Stuart Fish , I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

2 kudos

06-14-2022 8:49:32 AM

6 More Replies

by palzor • New Contributor III

04-20-2022 11:24:05 PM

8787 Views
5 replies
4 kudos

Getting error when using CDC in delta live table

Hi,I am trying to use CDC for delta live table, and when when I run the pipeline second time I get an error :org.apache.spark.sql.streaming.StreamingQueryException: Query tbl_cdc [id = ***-xx-xx-bf7e-6cb8b0deb690, runId = ***-xxxx-4031-ba74-b4b22be05...

Data Engineering

8787 Views
5 replies
4 kudos

04-20-2022 11:24:05 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-01-2022 4:55:13 PM

4 kudos

Hi @Palzor Lama,A streaming live table can only process append queries; that is, queries where new rows are inserted into the source table. Processing updates from source tables, for example, merges and deletes, is not supported. To process updates,...

4 kudos

06-01-2022 4:55:13 PM

4 More Replies