Data Engineering

Forum Posts

Sorted by:

by carlos_tasayco • Contributor

07-30-2025 8:37:43 AM

1522 Views
3 replies
0 kudos

Resolved! flattening json in dlt pipeline

Hi,I have in my bronze schema json files, I am flattening them in a dataframe after that I am creating materialized views in a dlt pipeline, however, in production is taking a lot of time (over 3 hours) is not even a lot of data the biggest materiali...

Data Engineering

1522 Views
3 replies
0 kudos

07-30-2025 8:37:43 AM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

07-31-2025 1:20:04 PM

0 kudos

Hello @carlos_tasayco As I mentioned whether you used any join, just wanted to ask, were this cross join were joining two large tables?

0 kudos

07-31-2025 1:20:04 PM

2 More Replies

by Pratikmsbsvm • Contributor

07-30-2025 9:38:45 PM

1515 Views
3 replies
0 kudos

Databricks Workflow Orchestration for Pipeline

Hello,I am using Databricks first time. May someone please help me how to do orchestration for the pipeline shown below.Kindly share the steps how to implement Orchestration , what all steps we have to consider.Thanks a lot

Data Engineering

1515 Views
3 replies
0 kudos

07-30-2025 9:38:45 PM

View Replies

Latest Reply

junaid-databrix
New Contributor III

07-31-2025 10:49:00 AM

0 kudos

The diagram you have shared is bit confusing: From Azure there is a data pull to Bronze layer, and from the same data source data is being pulled into Silver layer. However, following the Medallion architecture typically the raw data is ingested into...

0 kudos

07-31-2025 10:49:00 AM

2 More Replies

by ashokpola1 • New Contributor III

07-05-2025 7:13:11 PM

4538 Views
7 replies
5 kudos

Resolved! Are there any student discounts or coupons for the Databricks Data Engineer Associate certification

I’m planning to take the Databricks Data Engineer Associate certification exam, and I wanted to ask if there are any official discounts, coupons, or student offers available to help reduce the exam fee.I’m a student right now, so any discount or prom...

Data Engineering

4538 Views
7 replies
5 kudos

07-05-2025 7:13:11 PM

View Replies

Latest Reply

ashokpola1
New Contributor III

07-31-2025 9:52:32 AM

5 kudos

thank you

5 kudos

07-31-2025 9:52:32 AM

6 More Replies

by my_super_name • New Contributor III

04-15-2024 5:37:51 AM

3722 Views
3 replies
4 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

Data Engineering

3722 Views
3 replies
4 kudos

04-15-2024 5:37:51 AM

View Replies

Latest Reply

Mathias_Peters
Contributor II

01-06-2025 1:38:34 AM

4 kudos

Hi, we are having similar issues with schema hints formulated in fully qualified DDL, e.g. "a STRUCT<b INT>" etc. Did you find a solution? Also, did you specify the schema hint using the dot-notation, e.g. "a.b INT" before ingesting any data or after...

4 kudos

01-06-2025 1:38:34 AM

2 More Replies

by pooja_bhumandla • Databricks Partner

07-25-2025 5:04:52 AM

946 Views
1 replies
0 kudos

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

Hi Databricks Community,I’m analyzing the performance of Delta Lake MERGE operations on a partitioned table, and I observed unexpected behavior across 3 test cases.I wanted to share my findings to better understand:Why ZORDER or Deletion Vectors help...

Data Engineering

946 Views
1 replies
0 kudos

07-25-2025 5:04:52 AM

View Replies

Latest Reply

radothede
Valued Contributor II

07-31-2025 7:32:24 AM

0 kudos

Hi @pooja_bhumandla Thanks for such a nice and detailed description of Your case, that really helps to understand the scenario Regarding Your questions:1)Overall operation could become more complex due to:a) deletion vector creation and maintenance,b...

0 kudos

07-31-2025 7:32:24 AM

by noorbasha534 • Valued Contributor II

07-31-2025 5:32:07 AM

1913 Views
1 replies
2 kudos

Delta Lake File Sizes - optimize maxfilesize, tunefilesizesforrewrites

hello allwhile reading content that provides guidance on delta lake file size, i realized tuneFileSizesForRewrites behind the scenes targets for 256 MB file size.and optimize.maxFileSize will target for 1 GB file ((reference : https://docs.databricks...

Data Engineering

1913 Views
1 replies
2 kudos

07-31-2025 5:32:07 AM

View Replies

Latest Reply

radothede
Valued Contributor II

07-31-2025 6:54:05 AM

2 kudos

hello @noorbasha534 That's very interesting topic regarding fine-tuning file sizes under delta table.Answering Your questions:1)I use spark.databricks.delta.optimize.maxFileSize and set maximum file size for optimize command. Its working for me just ...

2 kudos

07-31-2025 6:54:05 AM

by Bank_Kirati • New Contributor III

07-31-2025 2:32:58 AM

1107 Views
2 replies
3 kudos

Resolved! Bug Report: SQL Editor, Run Now button run query from different query file/tab

When I click Run Now button or press (Command + Enter), the editor somehow executed query from another query file/tab. I can only use Run Selected for now. Clear cache and re-login didn't solve the problem.Please see the screen recording herehttps://...

Data Engineering

1107 Views
2 replies
3 kudos

07-31-2025 2:32:58 AM

View Replies

Latest Reply

Advika
Community Manager

07-31-2025 6:00:23 AM

3 kudos

Hello @Bank_Kirati! Thanks for sharing the screen recording. I’m unable to reproduce the issue on my end. Could you try toggling the New SQL Editor off and on, and see if that makes a difference? Also, please check if this happens only with specific ...

3 kudos

07-31-2025 6:00:23 AM

1 More Replies

by Nidhig • Databricks Partner

07-30-2025 7:23:45 AM

783 Views
1 replies
1 kudos

Partner Academy Login Issue

Hi Team, I mam facing the issue with the partner academy login.

Data Engineering

783 Views
1 replies
1 kudos

07-30-2025 7:23:45 AM

View Replies

Latest Reply

Advika
Community Manager

07-30-2025 7:44:45 AM

1 kudos

Hello @Nidhig! From the screenshot, it appears you're not currently authorized to access the Partner Academy. In this case please raise a ticket with the Databricks Support Team. They’ll be able to investigate further and assist you with gaining acce...

1 kudos

07-30-2025 7:44:45 AM

by Pratikmsbsvm • Contributor

07-30-2025 2:29:54 AM

1747 Views
3 replies
1 kudos

Resolved! Error Logging and Orchastration In Databricks

Hello,May someone please Help me designing the Error Logging and How to do orchestration for Pipeline below.I am pulling data from Bronze layer and pushing it to silver layer after transformation.1. How to do Error Logging and where to store2. How to...

Data Engineering

1747 Views
3 replies
1 kudos

07-30-2025 2:29:54 AM

View Replies

Latest Reply

Pratikmsbsvm
Contributor

07-30-2025 6:48:36 AM

1 kudos

@Brahmareddy : Thanks a lot. do you have any page which shows real implementation, if handy. Kindly share.

1 kudos

07-30-2025 6:48:36 AM

2 More Replies

by pooja_bhumandla • Databricks Partner

07-30-2025 3:33:00 AM

704 Views
2 replies
0 kudos

Will Unsetting delta.targetFileSize During Data Load Cause Any Issues?

Hi,If I unset the Spark config delta.targetFileSize (e.g., using alter) while a data load is in progress (batch or streaming), will it cause any issues?Will the load fail or behave inconsistently due to the config being changed mid-process?Thanks!

Data Engineering

704 Views
2 replies
0 kudos

07-30-2025 3:33:00 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

07-30-2025 6:28:43 AM

0 kudos

Hi pooja_bhumandla,How are you doing today? In general, changing the delta.targetFileSize config while a batch or streaming load is in progress won’t crash your job, but it may lead to inconsistent behavior during that specific run. Spark jobs usuall...

0 kudos

07-30-2025 6:28:43 AM

1 More Replies

by jeanptello • New Contributor

07-23-2025 8:34:38 AM

1435 Views
1 replies
0 kudos

Read Snowflake Iceberg tables from Databricks UC

Hi folks!I'm trying to read Iceberg tables that I created in Snowflake from Databricks using catalog federation. I set up a connection to Snowflake, configured an external location pointing to the S3 folder that contains the Iceberg files, and used t...

Data Engineering

Catalog Federation

Iceberg

1435 Views
1 replies
0 kudos

07-23-2025 8:34:38 AM

View Replies

Latest Reply

Isi
Honored Contributor III

07-30-2025 4:39:21 AM

0 kudos

Hey @jeanptello It’s possible that there’s a mismatch between what Snowflake has written and what Databricks is trying to read. This often happens if Snowflake has performed an operation that rewrites the table files (like compaction or a bulk update...

0 kudos

07-30-2025 4:39:21 AM

by VaderKB • New Contributor II

07-22-2025 2:52:50 AM

1842 Views
7 replies
0 kudos

Does too many parquet files in delta table impact writes for the streaming job

Hello,I am running a spark streaming job that reads data from AWS Kinesis and writes data to extrenal delta tables which are stored in S3. But I have noticed that over the time, the latency has been increasing. I also noticed that for each batch, the...

Data Engineering

1842 Views
7 replies
0 kudos

07-22-2025 2:52:50 AM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

07-30-2025 1:23:27 AM

0 kudos

Hello @VaderKB You're right that OPTIMIZE makes reads faster by reducing the number of files. For writes using append mode, it doesn't directly speed up the operation itself. However, having fewer, larger files from a previous OPTIMIZE run can improv...

0 kudos

07-30-2025 1:23:27 AM

6 More Replies

by ChrisLawford_n1 • Contributor II

07-29-2025 11:19:10 AM

1319 Views
2 replies
0 kudos

Autoloader with file notifications on a queue that is on a different storage account to the blobs

Hello,I am trying to set up autoloader using file notifications but as the storage account we are reading from is a premium storage account we have setup event subscriptions to pump the blob events to queues that exist on a standard gen 2 storage acc...

Data Engineering

1319 Views
2 replies
0 kudos

07-29-2025 11:19:10 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

07-29-2025 4:14:21 PM

0 kudos

Hi @ChrisLawford_n1 So in your case, it's not able to resolve the file paths from the event notificationsbecause they're pointing to a different storage account (Storage Account 1), which is not associated with the queue.Use a StorageV2 Account for B...

0 kudos

07-29-2025 4:14:21 PM

1 More Replies

by Ritasree • Databricks Partner

07-21-2025 3:33:06 AM

573 Views
2 replies
0 kudos

Unable to fetch 3rd level array using dot notation from a json data in local spark via VS Code

I am able to fetch 3rd level array using dot notation from a json in Databricks but same code is not working in local spark via VS Code.Example - df.select(F.col('x.y.z')) where x is array, y is array and z is also an array.In local spark I am gettin...

Data Engineering

573 Views
2 replies
0 kudos

07-21-2025 3:33:06 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-21-2025 4:30:35 AM

0 kudos

Hi @Ritasree ,When you're saying about local spark you mean that you've configured it locally, or are you using Databricks Connect? If you're configured spark locally, then check what version of Spark you're using.

0 kudos

07-21-2025 4:30:35 AM

1 More Replies

by thbeh_com • New Contributor III

07-29-2025 4:21:33 PM

1136 Views
2 replies
0 kudos

Resolved! Legacy hive_metatstore corruption

I am seeing some legacy hive_metastore corruption (especially tables created as parquet instead of Delta) lately in my client's place, who is in the midst of migrating to UC. We were provided with a Scala code to remove the erroneous Parquet files ph...

Data Engineering

1136 Views
2 replies
0 kudos

07-29-2025 4:21:33 PM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

07-29-2025 4:47:42 PM

0 kudos

HI @thbeh_com Yes, this is a fairly common issue during UC migrations, especially with legacy Hive metastore tables. The corruption typically happens because:- Metadata-data misalignment - Hive metastore references files that no longer exist or have ...

0 kudos

07-29-2025 4:47:42 PM

1 More Replies

Databricks Community

Forum Posts

Resolved! flattening json in dlt pipeline

Databricks Workflow Orchestration for Pipeline

Resolved! Are there any student discounts or coupons for the Databricks Data Engineer Associate certification

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

Delta Lake File Sizes - optimize maxfilesize, tunefilesizesforrewrites

Resolved! Bug Report: SQL Editor, Run Now button run query from different query file/tab

Partner Academy Login Issue

Resolved! Error Logging and Orchastration In Databricks

Will Unsetting delta.targetFileSize During Data Load Cause Any Issues?

Read Snowflake Iceberg tables from Databricks UC

Does too many parquet files in delta table impact writes for the streaming job

Autoloader with file notifications on a queue that is on a different storage account to the blobs

Unable to fetch 3rd level array using dot notation from a json data in local spark via VS Code

Resolved! Legacy hive_metatstore corruption

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...