Data Engineering

Forum Posts

Sorted by:

by noorbasha534 • Valued Contributor II

07-31-2025 5:32:07 AM

1908 Views
1 replies
2 kudos

Delta Lake File Sizes - optimize maxfilesize, tunefilesizesforrewrites

hello allwhile reading content that provides guidance on delta lake file size, i realized tuneFileSizesForRewrites behind the scenes targets for 256 MB file size.and optimize.maxFileSize will target for 1 GB file ((reference : https://docs.databricks...

Data Engineering

1908 Views
1 replies
2 kudos

07-31-2025 5:32:07 AM

View Replies

Latest Reply

radothede
Valued Contributor II

07-31-2025 6:54:05 AM

2 kudos

hello @noorbasha534 That's very interesting topic regarding fine-tuning file sizes under delta table.Answering Your questions:1)I use spark.databricks.delta.optimize.maxFileSize and set maximum file size for optimize command. Its working for me just ...

2 kudos

07-31-2025 6:54:05 AM

by Bank_Kirati • New Contributor III

07-31-2025 2:32:58 AM

1107 Views
2 replies
3 kudos

Resolved! Bug Report: SQL Editor, Run Now button run query from different query file/tab

When I click Run Now button or press (Command + Enter), the editor somehow executed query from another query file/tab. I can only use Run Selected for now. Clear cache and re-login didn't solve the problem.Please see the screen recording herehttps://...

Data Engineering

1107 Views
2 replies
3 kudos

07-31-2025 2:32:58 AM

View Replies

Latest Reply

Advika
Community Manager

07-31-2025 6:00:23 AM

3 kudos

Hello @Bank_Kirati! Thanks for sharing the screen recording. I’m unable to reproduce the issue on my end. Could you try toggling the New SQL Editor off and on, and see if that makes a difference? Also, please check if this happens only with specific ...

3 kudos

07-31-2025 6:00:23 AM

1 More Replies

by Nidhig • Databricks Partner

07-30-2025 7:23:45 AM

782 Views
1 replies
1 kudos

Partner Academy Login Issue

Hi Team, I mam facing the issue with the partner academy login.

Data Engineering

782 Views
1 replies
1 kudos

07-30-2025 7:23:45 AM

View Replies

Latest Reply

Advika
Community Manager

07-30-2025 7:44:45 AM

1 kudos

Hello @Nidhig! From the screenshot, it appears you're not currently authorized to access the Partner Academy. In this case please raise a ticket with the Databricks Support Team. They’ll be able to investigate further and assist you with gaining acce...

1 kudos

07-30-2025 7:44:45 AM

by Pratikmsbsvm • Contributor

07-30-2025 2:29:54 AM

1747 Views
3 replies
1 kudos

Resolved! Error Logging and Orchastration In Databricks

Hello,May someone please Help me designing the Error Logging and How to do orchestration for Pipeline below.I am pulling data from Bronze layer and pushing it to silver layer after transformation.1. How to do Error Logging and where to store2. How to...

Data Engineering

1747 Views
3 replies
1 kudos

07-30-2025 2:29:54 AM

View Replies

Latest Reply

Pratikmsbsvm
Contributor

07-30-2025 6:48:36 AM

1 kudos

@Brahmareddy : Thanks a lot. do you have any page which shows real implementation, if handy. Kindly share.

1 kudos

07-30-2025 6:48:36 AM

2 More Replies

by pooja_bhumandla • Databricks Partner

07-30-2025 3:33:00 AM

704 Views
2 replies
0 kudos

Will Unsetting delta.targetFileSize During Data Load Cause Any Issues?

Hi,If I unset the Spark config delta.targetFileSize (e.g., using alter) while a data load is in progress (batch or streaming), will it cause any issues?Will the load fail or behave inconsistently due to the config being changed mid-process?Thanks!

Data Engineering

704 Views
2 replies
0 kudos

07-30-2025 3:33:00 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

07-30-2025 6:28:43 AM

0 kudos

Hi pooja_bhumandla,How are you doing today? In general, changing the delta.targetFileSize config while a batch or streaming load is in progress won’t crash your job, but it may lead to inconsistent behavior during that specific run. Spark jobs usuall...

0 kudos

07-30-2025 6:28:43 AM

1 More Replies

by jeanptello • New Contributor

07-23-2025 8:34:38 AM

1431 Views
1 replies
0 kudos

Read Snowflake Iceberg tables from Databricks UC

Hi folks!I'm trying to read Iceberg tables that I created in Snowflake from Databricks using catalog federation. I set up a connection to Snowflake, configured an external location pointing to the S3 folder that contains the Iceberg files, and used t...

Data Engineering

Catalog Federation

Iceberg

1431 Views
1 replies
0 kudos

07-23-2025 8:34:38 AM

View Replies

Latest Reply

Isi
Honored Contributor III

07-30-2025 4:39:21 AM

0 kudos

Hey @jeanptello It’s possible that there’s a mismatch between what Snowflake has written and what Databricks is trying to read. This often happens if Snowflake has performed an operation that rewrites the table files (like compaction or a bulk update...

0 kudos

07-30-2025 4:39:21 AM

by VaderKB • New Contributor II

07-22-2025 2:52:50 AM

1839 Views
7 replies
0 kudos

Does too many parquet files in delta table impact writes for the streaming job

Hello,I am running a spark streaming job that reads data from AWS Kinesis and writes data to extrenal delta tables which are stored in S3. But I have noticed that over the time, the latency has been increasing. I also noticed that for each batch, the...

Data Engineering

1839 Views
7 replies
0 kudos

07-22-2025 2:52:50 AM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

07-30-2025 1:23:27 AM

0 kudos

Hello @VaderKB You're right that OPTIMIZE makes reads faster by reducing the number of files. For writes using append mode, it doesn't directly speed up the operation itself. However, having fewer, larger files from a previous OPTIMIZE run can improv...

0 kudos

07-30-2025 1:23:27 AM

6 More Replies

by ChrisLawford_n1 • Contributor II

07-29-2025 11:19:10 AM

1319 Views
2 replies
0 kudos

Autoloader with file notifications on a queue that is on a different storage account to the blobs

Hello,I am trying to set up autoloader using file notifications but as the storage account we are reading from is a premium storage account we have setup event subscriptions to pump the blob events to queues that exist on a standard gen 2 storage acc...

Data Engineering

1319 Views
2 replies
0 kudos

07-29-2025 11:19:10 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

07-29-2025 4:14:21 PM

0 kudos

Hi @ChrisLawford_n1 So in your case, it's not able to resolve the file paths from the event notificationsbecause they're pointing to a different storage account (Storage Account 1), which is not associated with the queue.Use a StorageV2 Account for B...

0 kudos

07-29-2025 4:14:21 PM

1 More Replies

by Ritasree • Databricks Partner

07-21-2025 3:33:06 AM

573 Views
2 replies
0 kudos

Unable to fetch 3rd level array using dot notation from a json data in local spark via VS Code

I am able to fetch 3rd level array using dot notation from a json in Databricks but same code is not working in local spark via VS Code.Example - df.select(F.col('x.y.z')) where x is array, y is array and z is also an array.In local spark I am gettin...

Data Engineering

573 Views
2 replies
0 kudos

07-21-2025 3:33:06 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-21-2025 4:30:35 AM

0 kudos

Hi @Ritasree ,When you're saying about local spark you mean that you've configured it locally, or are you using Databricks Connect? If you're configured spark locally, then check what version of Spark you're using.

0 kudos

07-21-2025 4:30:35 AM

1 More Replies

by thbeh_com • New Contributor III

07-29-2025 4:21:33 PM

1133 Views
2 replies
0 kudos

Resolved! Legacy hive_metatstore corruption

I am seeing some legacy hive_metastore corruption (especially tables created as parquet instead of Delta) lately in my client's place, who is in the midst of migrating to UC. We were provided with a Scala code to remove the erroneous Parquet files ph...

Data Engineering

1133 Views
2 replies
0 kudos

07-29-2025 4:21:33 PM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

07-29-2025 4:47:42 PM

0 kudos

HI @thbeh_com Yes, this is a fairly common issue during UC migrations, especially with legacy Hive metastore tables. The corruption typically happens because:- Metadata-data misalignment - Hive metastore references files that no longer exist or have ...

0 kudos

07-29-2025 4:47:42 PM

1 More Replies

by Victor_Cruz_Mex • New Contributor III

07-25-2025 1:13:52 PM

2286 Views
1 replies
1 kudos

Resolved! Spark Structured Streaming Timeout Waiting for KafkaAdminClient Node Assignment on Amazon MSK

Hello! I’m having trouble establishing a Kafka connection between my Databricks notebook and my Kafka server in Amazon MSK. I’ve run some tests and I’m really stuck—I hope someone can help me.I have two brokers. First, I checked connectivity with:%sh...

Data Engineering

2286 Views
1 replies
1 kudos

07-25-2025 1:13:52 PM

View Replies

Latest Reply

Victor_Cruz_Mex
New Contributor III

07-29-2025 1:19:02 PM

1 kudos

We found the solution!, thanks to a Databricks architect here’s how we ultimately fixed it: 1.- Copy the JVM’s cacerts into a volume so Spark can trust Amazon’s MSK certificate bundle. From a notebook cell with shell access, run:%sh JAVA_HOME=$(dirna...

1 kudos

07-29-2025 1:19:02 PM

by Akshay_Petkar • Valued Contributor

07-29-2025 8:52:20 AM

855 Views
1 replies
1 kudos

Resolved! How Auto Loader works – file level or row level?

Does Auto Loader work on file level or row level? If it works on file level and does not process the same file again, then how can we make it pick only the new rows when data is appended to that file?

Data Engineering

855 Views
1 replies
1 kudos

07-29-2025 8:52:20 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-29-2025 9:12:46 AM

1 kudos

Hi @Akshay_Petkar ,Autoloader works on file level. Now, by default autoloader is configured with following option:cloudFiles.allowOverwrites = falseSo, above option causes files to be processed exactly once. But when you switch this option to true, t...

1 kudos

07-29-2025 9:12:46 AM

by sensanjoy • Contributor II

05-19-2025 8:42:22 AM

3536 Views
8 replies
1 kudos

Resolved! Accessing parameter defined in python notebook into sql notebook.

Hi All,I have one python notebook(../../config/param_notebook), where all parameters are defined, like:dbutils.widgets.text( "catalog", "catalog_de")spark.conf.set( "catalog.name", dbutils.widgets.get( "catalog"))dbutils.widgets.text( "schema", "emp"...

Data Engineering

3536 Views
8 replies
1 kudos

05-19-2025 8:42:22 AM

View Replies

Latest Reply

Rupal_P
New Contributor II

07-29-2025 5:41:11 AM

1 kudos

Hi all,I have a SQL notebook that contains the following statement:CREATE OR REPLACE MATERIALIZED VIEW ${catalog_name}.${schema_name}.emp_table ASSELECT ...I’ve configured the values for catalog_name and schema_name as pipeline parameters in my DLT p...

1 kudos

07-29-2025 5:41:11 AM

7 More Replies

by amrim • New Contributor III

07-24-2025 2:36:06 AM

909 Views
1 replies
1 kudos

Resolved! Notebook dashboard export unavailable

Hello,Recent changes in the databricks notebook dashboards have removed the option to download the dashboard as HTML.Previously it was possible to download it from the notebook dashboard view. Currently it's only possible to download the notebook its...

Data Engineering

909 Views
1 replies
1 kudos

07-24-2025 2:36:06 AM

View Replies

Latest Reply

Advika
Community Manager

07-29-2025 4:13:42 AM

1 kudos

Hello @amrim! You're right to flag this, thank you for bringing it up. I’ll check internally for any upcoming changes regarding this feature or alternative ways to download the notebook dashboard in HTML format. I’ll get back to you once I have an up...

1 kudos

07-29-2025 4:13:42 AM

by surajtr • New Contributor

07-28-2025 9:27:10 AM

924 Views
1 replies
0 kudos

Reading a large zip file containing NDJson file in Databricks

Hi,We have a 5 GB ZIP file stored in ADLS. When uncompressed, it expands to approximately 115 GB and contains multiple NDJSON files, each around 200 MB in size. We need to read this data and write it to a Delta table in Databricks on a weekly basis.W...

Data Engineering

924 Views
1 replies
0 kudos

07-28-2025 9:27:10 AM

View Replies

Latest Reply

chetan-mali
Contributor

07-29-2025 1:53:09 AM

0 kudos

Unzip the Archive FileApache Spark cannot directly read compressed ZIP archives, so the first step is to decompress the 5 GB file. Since the uncompressed size is substantial (115 GB), the process must be handled carefully to avoid overwhelming the dr...

0 kudos

07-29-2025 1:53:09 AM

Databricks Community

Forum Posts

Delta Lake File Sizes - optimize maxfilesize, tunefilesizesforrewrites

Resolved! Bug Report: SQL Editor, Run Now button run query from different query file/tab

Partner Academy Login Issue

Resolved! Error Logging and Orchastration In Databricks

Will Unsetting delta.targetFileSize During Data Load Cause Any Issues?

Read Snowflake Iceberg tables from Databricks UC

Does too many parquet files in delta table impact writes for the streaming job

Autoloader with file notifications on a queue that is on a different storage account to the blobs

Unable to fetch 3rd level array using dot notation from a json data in local spark via VS Code

Resolved! Legacy hive_metatstore corruption

Resolved! Spark Structured Streaming Timeout Waiting for KafkaAdminClient Node Assignment on Amazon MSK

Resolved! How Auto Loader works – file level or row level?

Resolved! Accessing parameter defined in python notebook into sql notebook.

Resolved! Notebook dashboard export unavailable

Reading a large zip file containing NDJson file in Databricks

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template