cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

noorbasha534
by Valued Contributor II
  • 1908 Views
  • 1 replies
  • 2 kudos

Delta Lake File Sizes - optimize maxfilesize, tunefilesizesforrewrites

hello allwhile reading content that provides guidance on delta lake file size, i realized tuneFileSizesForRewrites behind the scenes targets for 256 MB file size.and optimize.maxFileSize will target for 1 GB file ((reference : https://docs.databricks...

  • 1908 Views
  • 1 replies
  • 2 kudos
Latest Reply
radothede
Valued Contributor II
  • 2 kudos

hello @noorbasha534 That's very interesting topic regarding fine-tuning file sizes under delta table.Answering Your questions:1)I use spark.databricks.delta.optimize.maxFileSize and set maximum file size for optimize command. Its working for me just ...

  • 2 kudos
Bank_Kirati
by New Contributor III
  • 1107 Views
  • 2 replies
  • 3 kudos

Resolved! Bug Report: SQL Editor, Run Now button run query from different query file/tab

When I click Run Now button or press (Command + Enter), the editor somehow executed query from another query file/tab. I can only use Run Selected for now. Clear cache and re-login didn't solve the problem.Please see the screen recording herehttps://...

  • 1107 Views
  • 2 replies
  • 3 kudos
Latest Reply
Advika
Community Manager
  • 3 kudos

Hello @Bank_Kirati! Thanks for sharing the screen recording. I’m unable to reproduce the issue on my end. Could you try toggling the New SQL Editor off and on, and see if that makes a difference? Also, please check if this happens only with specific ...

  • 3 kudos
1 More Replies
Nidhig
by Databricks Partner
  • 782 Views
  • 1 replies
  • 1 kudos

Partner Academy Login Issue

Hi Team, I  mam facing the issue with the partner academy login.

  • 782 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @Nidhig! From the screenshot, it appears you're not currently authorized to access the Partner Academy. In this case please raise a ticket with the Databricks Support Team. They’ll be able to investigate further and assist you with gaining acce...

  • 1 kudos
Pratikmsbsvm
by Contributor
  • 1747 Views
  • 3 replies
  • 1 kudos

Resolved! Error Logging and Orchastration In Databricks

Hello,May someone please Help me designing the Error Logging and How to do orchestration for Pipeline below.I am pulling data from Bronze layer and pushing it to silver layer after transformation.1. How to do Error Logging and where to store2. How to...

Pratikmsbsvm_0-1753867667783.png
  • 1747 Views
  • 3 replies
  • 1 kudos
Latest Reply
Pratikmsbsvm
Contributor
  • 1 kudos

@Brahmareddy : Thanks a lot. do you have any page which shows real implementation, if handy. Kindly share.

  • 1 kudos
2 More Replies
pooja_bhumandla
by Databricks Partner
  • 704 Views
  • 2 replies
  • 0 kudos

Will Unsetting delta.targetFileSize During Data Load Cause Any Issues?

Hi,If I unset the Spark config delta.targetFileSize (e.g., using alter) while a data load is in progress (batch or streaming), will it cause any issues?Will the load fail or behave inconsistently due to the config being changed mid-process?Thanks!

  • 704 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi pooja_bhumandla,How are you doing today? In general, changing the delta.targetFileSize config while a batch or streaming load is in progress won’t crash your job, but it may lead to inconsistent behavior during that specific run. Spark jobs usuall...

  • 0 kudos
1 More Replies
jeanptello
by New Contributor
  • 1431 Views
  • 1 replies
  • 0 kudos

Read Snowflake Iceberg tables from Databricks UC

Hi folks!I'm trying to read Iceberg tables that I created in Snowflake from Databricks using catalog federation. I set up a connection to Snowflake, configured an external location pointing to the S3 folder that contains the Iceberg files, and used t...

Data Engineering
Catalog Federation
Iceberg
  • 1431 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @jeanptello It’s possible that there’s a mismatch between what Snowflake has written and what Databricks is trying to read. This often happens if Snowflake has performed an operation that rewrites the table files (like compaction or a bulk update...

  • 0 kudos
VaderKB
by New Contributor II
  • 1839 Views
  • 7 replies
  • 0 kudos

Does too many parquet files in delta table impact writes for the streaming job

Hello,I am running a spark streaming job that reads data from AWS Kinesis and writes data to extrenal delta tables which are stored in S3. But I have noticed that over the time, the latency has been increasing. I also noticed that for each batch, the...

  • 1839 Views
  • 7 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @VaderKB You're right that OPTIMIZE makes reads faster by reducing the number of files. For writes using append mode, it doesn't directly speed up the operation itself. However, having fewer, larger files from a previous OPTIMIZE run can improv...

  • 0 kudos
6 More Replies
ChrisLawford_n1
by Contributor II
  • 1319 Views
  • 2 replies
  • 0 kudos

Autoloader with file notifications on a queue that is on a different storage account to the blobs

Hello,I am trying to set up autoloader using file notifications but as the storage account we are reading from is a premium storage account we have setup event subscriptions to pump the blob events to queues that exist on a standard gen 2 storage acc...

  • 1319 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @ChrisLawford_n1 So in your case, it's not able to resolve the file paths from the event notificationsbecause they're pointing to a different storage account (Storage Account 1), which is not associated with the queue.Use a StorageV2 Account for B...

  • 0 kudos
1 More Replies
Ritasree
by Databricks Partner
  • 573 Views
  • 2 replies
  • 0 kudos

Unable to fetch 3rd level array using dot notation from a json data in local spark via VS Code

I am able to fetch 3rd level array using dot notation from a json in Databricks but same code is not working in local spark via VS Code.Example - df.select(F.col('x.y.z')) where x is array, y is array and z is also an array.In local spark I am gettin...

  • 573 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Ritasree ,When you're saying about local spark you mean that you've configured it locally, or are you using Databricks Connect? If you're configured spark locally, then check what version of Spark you're using.

  • 0 kudos
1 More Replies
thbeh_com
by New Contributor III
  • 1133 Views
  • 2 replies
  • 0 kudos

Resolved! Legacy hive_metatstore corruption

I am seeing some legacy hive_metastore corruption (especially tables created as parquet instead of Delta) lately in my client's place, who is in the midst of migrating to UC. We were provided with a Scala code to remove the erroneous Parquet files ph...

  • 1133 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

HI @thbeh_com Yes, this is a fairly common issue during UC migrations, especially with legacy Hive metastore tables. The corruption typically happens because:- Metadata-data misalignment - Hive metastore references files that no longer exist or have ...

  • 0 kudos
1 More Replies
Victor_Cruz_Mex
by New Contributor III
  • 2286 Views
  • 1 replies
  • 1 kudos

Resolved! Spark Structured Streaming Timeout Waiting for KafkaAdminClient Node Assignment on Amazon MSK

Hello! I’m having trouble establishing a Kafka connection between my Databricks notebook and my Kafka server in Amazon MSK. I’ve run some tests and I’m really stuck—I hope someone can help me.I have two brokers. First, I checked connectivity with:%sh...

  • 2286 Views
  • 1 replies
  • 1 kudos
Latest Reply
Victor_Cruz_Mex
New Contributor III
  • 1 kudos

We found the solution!, thanks to a Databricks architect here’s how we ultimately fixed it: 1.- Copy the JVM’s cacerts into a volume so Spark can trust Amazon’s MSK certificate bundle. From a notebook cell with shell access, run:%sh JAVA_HOME=$(dirna...

  • 1 kudos
Akshay_Petkar
by Valued Contributor
  • 855 Views
  • 1 replies
  • 1 kudos

Resolved! How Auto Loader works – file level or row level?

Does Auto Loader work on file level or row level? If it works on file level and does not process the same file again, then how can we make it pick only the new rows when data is appended to that file?

  • 855 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Akshay_Petkar ,Autoloader works on file level. Now, by default autoloader is configured with following option:cloudFiles.allowOverwrites = falseSo, above option causes files to be processed exactly once. But when you switch this option to true, t...

  • 1 kudos
sensanjoy
by Contributor II
  • 3536 Views
  • 8 replies
  • 1 kudos

Resolved! Accessing parameter defined in python notebook into sql notebook.

Hi All,I have one python notebook(../../config/param_notebook), where all parameters are defined, like:dbutils.widgets.text( "catalog", "catalog_de")spark.conf.set( "catalog.name", dbutils.widgets.get( "catalog"))dbutils.widgets.text( "schema", "emp"...

  • 3536 Views
  • 8 replies
  • 1 kudos
Latest Reply
Rupal_P
New Contributor II
  • 1 kudos

Hi all,I have a SQL notebook that contains the following statement:CREATE OR REPLACE MATERIALIZED VIEW ${catalog_name}.${schema_name}.emp_table ASSELECT ...I’ve configured the values for catalog_name and schema_name as pipeline parameters in my DLT p...

  • 1 kudos
7 More Replies
amrim
by New Contributor III
  • 909 Views
  • 1 replies
  • 1 kudos

Resolved! Notebook dashboard export unavailable

Hello,Recent changes in the databricks notebook dashboards have removed the option to download the dashboard as HTML.Previously it was possible to download it from the notebook dashboard view. Currently it's only possible to download the notebook its...

  • 909 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @amrim! You're right to flag this, thank you for bringing it up. I’ll check internally for any upcoming changes regarding this feature or alternative ways to download the notebook dashboard in HTML format. I’ll get back to you once I have an up...

  • 1 kudos
surajtr
by New Contributor
  • 924 Views
  • 1 replies
  • 0 kudos

Reading a large zip file containing NDJson file in Databricks

Hi,We have a 5 GB ZIP file stored in ADLS. When uncompressed, it expands to approximately 115 GB and contains multiple NDJSON files, each around 200 MB in size. We need to read this data and write it to a Delta table in Databricks on a weekly basis.W...

  • 924 Views
  • 1 replies
  • 0 kudos
Latest Reply
chetan-mali
Contributor
  • 0 kudos

Unzip the Archive FileApache Spark cannot directly read compressed ZIP archives, so the first step is to decompress the 5 GB file. Since the uncompressed size is substantial (115 GB), the process must be handled carefully to avoid overwhelming the dr...

  • 0 kudos
Labels