Data Engineering

Forum Posts

Sorted by:

Start a conversation

by brickster_2018 • Databricks Employee

06-25-2021 3:41:25 PM

8277 Views
6 replies
2 kudos

Why do we need CRC files in Delta logs. How does CRC file help for the transaction control in Delta

Data Engineering

8277 Views
6 replies
2 kudos

06-25-2021 3:41:25 PM

View Replies

Latest Reply

VasuBajaj
New Contributor II

a week ago

2 kudos

A .CRC file (Cyclic Redundancy Check) is an internal checksum file used by Spark (and Hadoop) to ensure data integrity when reading and writing files.Data Integrity Check – .CRC files store checksums of actual data files. When reading a file, Spark/H...

2 kudos

a week ago

5 More Replies

by kolangareth • New Contributor III

01-13-2022 7:46:18 AM

6215 Views
11 replies
3 kudos

Resolved! to_date not functioning as expected after introduction of arbitrary replaceWhere in Databricks 9.1 LTS

I am trying to do a dynamic partition overwrite on delta table using replaceWhere option. This was working fine until I upgraded the DB runtime to 9.1 LTS from 8.3.x. I am concatenating 'year', 'month' and 'day' columns and then using to_date functio...

Data Engineering

6215 Views
11 replies
3 kudos

01-13-2022 7:46:18 AM

View Replies

Latest Reply

ltreweek
New Contributor II

02-18-2025 2:01:11 PM

3 kudos

SELECT TO_DATE('20250217','YYYYMMDD'); gives the error: PARSE_SYNTAX_ERROR syntax error at or near 'select'. sqlstate: 42601. It datagrip, it works no problem and displays the date.

3 kudos

02-18-2025 2:01:11 PM

10 More Replies

by diguid • New Contributor III

11-22-2022 2:22:46 PM

3565 Views
3 replies
13 kudos

Using foreachBatch within Delta Live Tables framework

Hey there!I was wondering if there's any way of declaring a delta live table where we use foreachBatch to process the output of a streaming query.Here's a simplification of my code:def join_data(df_1, df_2): df_joined = ( df_1 ...

Data Engineering

3565 Views
3 replies
13 kudos

11-22-2022 2:22:46 PM

View Replies

Latest Reply

cgrant
Databricks Employee

4 weeks ago

13 kudos

foreachBatch support in DLT is coming soon, and you now have the ability to write to non-DLT sinks as well

13 kudos

4 weeks ago

2 More Replies

by boskicl • New Contributor III

03-23-2022 11:04:23 AM

29865 Views
6 replies
10 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

Data Engineering

29865 Views
6 replies
10 kudos

03-23-2022 11:04:23 AM

View Replies

Latest Reply

timo199
New Contributor II

12-29-2024 11:22:43 PM

10 kudos

Even if I vacuum and optimize, it keeps getting stuck.cluster type is r6gd.xlarge min:4, max:6driver type is r6gd.2xlarge

10 kudos

12-29-2024 11:22:43 PM

5 More Replies

by tinai_long • New Contributor III

05-09-2022 2:14:59 AM

10427 Views
12 replies
6 kudos

Resolved! How to refresh a single table in Delta Live Tables?

Suppose I have a Delta Live Tables framework with 2 tables: Table 1 ingests from a json source, Table 2 reads from Table 1 and runs some transformation.In other words, the data flow is json source -> Table 1 -> Table 2. Now if I find some bugs in the...

Data Engineering

10427 Views
12 replies
6 kudos

05-09-2022 2:14:59 AM

View Replies

Latest Reply

cpayne_vax
New Contributor III

02-09-2024 11:05:35 AM

6 kudos

Answering my own question: nowadays (February 2024) this can all be done via the UI.When viewing your DLT pipeline there is a "Select tables for refresh" button in the header. If you click this, you can select individual tables, and then in the botto...

6 kudos

02-09-2024 11:05:35 AM

11 More Replies

by Tahseen0354 • Valued Contributor

10-14-2021 10:45:35 AM

24308 Views
9 replies
5 kudos

Resolved! Getting "Job aborted due to stage failure" SparkException when trying to download full result

I have generated a result using SQL. But whenever I try to download the full result (1 million rows), it is throwing SparkException. I can download the preview result but not the full result. Why ? What happens under the hood when I try to download ...

Data Engineering

24308 Views
9 replies
5 kudos

10-14-2021 10:45:35 AM

View Replies

Latest Reply

ac567
New Contributor III

12-13-2024 1:56:15 PM

5 kudos

Job aborted due to stage failure: Task 6506 in stage 46.0 failed 4 times, most recent failure: Lost task 6506.3 in stage 46.0 (TID 12896) (10.**.***.*** executor 12): java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct buffer memory (a...

5 kudos

12-13-2024 1:56:15 PM

8 More Replies

by brendanc19 • New Contributor III

03-07-2023 6:51:38 AM

4389 Views
6 replies
2 kudos

Resolved! Does cancelling a job run rollback any actions performed by query plan?

If I were to stop a rather large job run, say half way thru execution, will any actions performed on our Delta tables persist or will they be rolled back?Are there any other risks that I need to be aware of in terms of cancelling a job run half way t...

Data Engineering

4389 Views
6 replies
2 kudos

03-07-2023 6:51:38 AM

View Replies

Latest Reply

fabian_r
New Contributor II

12-03-2024 5:26:59 AM

2 kudos

Hi, is there any way to ensure transaction control in delta protocol in 2024 across tables for failing jobs?

2 kudos

12-03-2024 5:26:59 AM

5 More Replies

by labromb • Contributor

04-19-2023 9:05:29 AM

13271 Views
10 replies
4 kudos

How to pass configuration values to a Delta Live Tables job through the Delta Live Tables API

Hi Community,I have successfully run a job through the API but would need to be able to pass parameters (configuration) to the DLT workflow via the APII have tried passing JSON in this format:{ "full_refresh": "true", "configuration": [ ...

Data Engineering

13271 Views
10 replies
4 kudos

04-19-2023 9:05:29 AM

View Replies

Latest Reply

Edthehead
Contributor III

11-30-2024 6:39:18 PM

4 kudos

You cannot pass parameters from a Databricks job to a DLT pipeline. Atleast not yet. You can see from the DLT rest API that there is no option for it to accept any parameters.But there is a workaround.But there is a workaround.With the assumption tha...

4 kudos

11-30-2024 6:39:18 PM

9 More Replies

by Data_Engineer3 • Contributor III

04-02-2023 9:20:18 AM

3469 Views
5 replies
0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL] #[Spark streaming] #[Spark structured streaming] #Spark

Data Engineering

3469 Views
5 replies
0 kudos

04-02-2023 9:20:18 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

10-31-2024 3:02:59 AM

0 kudos

doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html Also, what is the challenge while using foreachbatch?

0 kudos

10-31-2024 3:02:59 AM

4 More Replies

by SRK • Contributor III

10-01-2022 3:15:10 AM

3776 Views
5 replies
7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1. I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2. I am using Spark code to read data from Kafka and write into landing...

Data Engineering

3776 Views
5 replies
7 kudos

10-01-2022 3:15:10 AM

View Replies

Latest Reply

maddy08
New Contributor II

10-24-2024 10:01:27 PM

7 kudos

just to clarify, are you reading kafka and writing into adls in json files? like for each message from kafka is 1 json file in adls ?

7 kudos

10-24-2024 10:01:27 PM

4 More Replies

by AsfandQ • New Contributor III

06-18-2022 4:02:53 AM

17671 Views
7 replies
6 kudos

Resolved! Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...

Data Engineering

17671 Views
7 replies
6 kudos

06-18-2022 4:02:53 AM

View Replies

Latest Reply

Personal1
New Contributor II

10-01-2024 4:38:58 PM

6 kudos

I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \ .mo...

6 kudos

10-01-2024 4:38:58 PM

6 More Replies

by DJey • New Contributor III

06-02-2023 6:52:05 AM

15266 Views
6 replies
2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

Data Engineering

15266 Views
6 replies
2 kudos

06-02-2023 6:52:05 AM

View Replies

Latest Reply

Amin112
New Contributor II

09-26-2024 8:51:35 PM

2 kudos

In Databricks Runtime 15.2 and above, you can specify schema evolution in a merge statement using SQL or Delta table APIs:MERGE WITH SCHEMA EVOLUTION INTO targetUSING sourceON source.key = target.keyWHEN MATCHED THENUPDATE SET *WHEN NOT MATCHED THENI...

2 kudos

09-26-2024 8:51:35 PM

5 More Replies

by Valentin1 • New Contributor III

04-02-2023 2:30:24 AM

8542 Views
6 replies
3 kudos

Delta Live Tables Incremental Batch Loads & Failure Recovery

Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...

Data Engineering

8542 Views
6 replies
3 kudos

04-02-2023 2:30:24 AM

View Replies

Latest Reply

lprevost
Contributor II

09-21-2024 10:45:44 AM

3 kudos

I totally agree that this is a gap in the Databricks solution. This gap exists between a static read and real time streaming. My problem (and suspect there are many use cases) is that I have slowly changing data coming into structured folders via ...

3 kudos

09-21-2024 10:45:44 AM

5 More Replies

by bgerhardi • New Contributor III

11-01-2022 1:26:41 PM

11126 Views
12 replies
13 kudos

Surrogate Keys with Delta Live

We are considering moving to Delta Live tables from a traditional sql-based data warehouse. Worrying me is this FAQ on identity columns Delta Live Tables frequently asked questions | Databricks on AWS this seems to suggest that we basically can't cre...

Data Engineering

11126 Views
12 replies
13 kudos

11-01-2022 1:26:41 PM

View Replies

Latest Reply

Pelle123
New Contributor II

09-05-2024 7:24:24 AM

13 kudos

I am wondering if there is an answer to this question now.

13 kudos

09-05-2024 7:24:24 AM

11 More Replies

by YFL • New Contributor III

12-09-2021 12:45:43 PM

7326 Views
11 replies
6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

Data Engineering

7326 Views
11 replies
6 kudos

12-09-2021 12:45:43 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-12-2022 6:44:35 AM

6 kudos

Hey @Yerachmiel Feltzman I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

6 kudos

05-12-2022 6:44:35 AM

10 More Replies