cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ashokpola1
by New Contributor III
  • 2801 Views
  • 7 replies
  • 5 kudos

Resolved! Are there any student discounts or coupons for the Databricks Data Engineer Associate certification

I’m planning to take the Databricks Data Engineer Associate certification exam, and I wanted to ask if there are any official discounts, coupons, or student offers available to help reduce the exam fee.I’m a student right now, so any discount or prom...

  • 2801 Views
  • 7 replies
  • 5 kudos
Latest Reply
ashokpola1
New Contributor III
  • 5 kudos

thank you

  • 5 kudos
6 More Replies
my_super_name
by New Contributor III
  • 3016 Views
  • 3 replies
  • 4 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

  • 3016 Views
  • 3 replies
  • 4 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 4 kudos

Hi, we are having similar issues with schema hints formulated in fully qualified DDL, e.g. "a STRUCT<b INT>" etc. Did you find a solution? Also, did you specify the schema hint using the dot-notation, e.g. "a.b INT" before ingesting any data or after...

  • 4 kudos
2 More Replies
pooja_bhumandla
by New Contributor II
  • 189 Views
  • 1 replies
  • 0 kudos

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

Hi Databricks Community,I’m analyzing the performance of Delta Lake MERGE operations on a partitioned table, and I observed unexpected behavior across 3 test cases.I wanted to share my findings to better understand:Why ZORDER or Deletion Vectors help...

  • 189 Views
  • 1 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @pooja_bhumandla Thanks for such a nice and detailed description of Your case, that really helps to understand the scenario Regarding Your questions:1)Overall operation could become more complex due to:a) deletion vector creation and maintenance,b...

  • 0 kudos
noorbasha534
by Valued Contributor
  • 433 Views
  • 1 replies
  • 2 kudos

Delta Lake File Sizes - optimize maxfilesize, tunefilesizesforrewrites

hello allwhile reading content that provides guidance on delta lake file size, i realized tuneFileSizesForRewrites behind the scenes targets for 256 MB file size.and optimize.maxFileSize will target for 1 GB file ((reference : https://docs.databricks...

  • 433 Views
  • 1 replies
  • 2 kudos
Latest Reply
radothede
Valued Contributor II
  • 2 kudos

hello @noorbasha534 That's very interesting topic regarding fine-tuning file sizes under delta table.Answering Your questions:1)I use spark.databricks.delta.optimize.maxFileSize and set maximum file size for optimize command. Its working for me just ...

  • 2 kudos
Bank_Kirati
by New Contributor III
  • 721 Views
  • 2 replies
  • 3 kudos

Resolved! Bug Report: SQL Editor, Run Now button run query from different query file/tab

When I click Run Now button or press (Command + Enter), the editor somehow executed query from another query file/tab. I can only use Run Selected for now. Clear cache and re-login didn't solve the problem.Please see the screen recording herehttps://...

  • 721 Views
  • 2 replies
  • 3 kudos
Latest Reply
Advika
Databricks Employee
  • 3 kudos

Hello @Bank_Kirati! Thanks for sharing the screen recording. I’m unable to reproduce the issue on my end. Could you try toggling the New SQL Editor off and on, and see if that makes a difference? Also, please check if this happens only with specific ...

  • 3 kudos
1 More Replies
saurabh_aher
by New Contributor III
  • 1113 Views
  • 8 replies
  • 1 kudos

RECURSION_ROW_LIMIT - how to increase more than 1M ?

 I have usecase where we requires rows more than 1M. buts recursion is limited to 1M. how to increase this limit in Recursive CTE ?   

saurabh_aher_0-1753944326907.png saurabh_aher_1-1753944347987.png
  • 1113 Views
  • 8 replies
  • 1 kudos
Latest Reply
saurabh_aher
New Contributor III
  • 1 kudos

 I have a Databricks SQL table with the following columns:id | name | managerId | rolenameThis table contains hierarchical data for all employees in the organization, where each employee has an associated manager (except for the CEO, whose managerId ...

  • 1 kudos
7 More Replies
Nidhig
by New Contributor III
  • 345 Views
  • 1 replies
  • 1 kudos

Partner Academy Login Issue

Hi Team, I  mam facing the issue with the partner academy login.

  • 345 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @Nidhig! From the screenshot, it appears you're not currently authorized to access the Partner Academy. In this case please raise a ticket with the Databricks Support Team. They’ll be able to investigate further and assist you with gaining acce...

  • 1 kudos
andr3s
by New Contributor II
  • 38137 Views
  • 7 replies
  • 2 kudos

SSL_connect: certificate verify failed with Power BI

Hi, I'm getting this error with Power BI:Any ideas?Thanks in advance,Andres

Screenshot 2023-05-19 154328
  • 38137 Views
  • 7 replies
  • 2 kudos
Latest Reply
benjaminpieplow
New Contributor II
  • 2 kudos

We had a very similar issue. The full (redacted) error from Power BI:```Unable to update connection credentials. Unable to connect to the data source. Either the data source is inaccessible, a connection timeout occurred, or the data source credentia...

  • 2 kudos
6 More Replies
Pratikmsbsvm
by Contributor
  • 762 Views
  • 3 replies
  • 1 kudos

Resolved! Error Logging and Orchastration In Databricks

Hello,May someone please Help me designing the Error Logging and How to do orchestration for Pipeline below.I am pulling data from Bronze layer and pushing it to silver layer after transformation.1. How to do Error Logging and where to store2. How to...

Pratikmsbsvm_0-1753867667783.png
  • 762 Views
  • 3 replies
  • 1 kudos
Latest Reply
Pratikmsbsvm
Contributor
  • 1 kudos

@Brahmareddy : Thanks a lot. do you have any page which shows real implementation, if handy. Kindly share.

  • 1 kudos
2 More Replies
pooja_bhumandla
by New Contributor II
  • 419 Views
  • 2 replies
  • 0 kudos

Will Unsetting delta.targetFileSize During Data Load Cause Any Issues?

Hi,If I unset the Spark config delta.targetFileSize (e.g., using alter) while a data load is in progress (batch or streaming), will it cause any issues?Will the load fail or behave inconsistently due to the config being changed mid-process?Thanks!

  • 419 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi pooja_bhumandla,How are you doing today? In general, changing the delta.targetFileSize config while a batch or streaming load is in progress won’t crash your job, but it may lead to inconsistent behavior during that specific run. Spark jobs usuall...

  • 0 kudos
1 More Replies
jeanptello
by New Contributor
  • 436 Views
  • 1 replies
  • 0 kudos

Read Snowflake Iceberg tables from Databricks UC

Hi folks!I'm trying to read Iceberg tables that I created in Snowflake from Databricks using catalog federation. I set up a connection to Snowflake, configured an external location pointing to the S3 folder that contains the Iceberg files, and used t...

Data Engineering
Catalog Federation
Iceberg
  • 436 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor II
  • 0 kudos

Hey @jeanptello It’s possible that there’s a mismatch between what Snowflake has written and what Databricks is trying to read. This often happens if Snowflake has performed an operation that rewrites the table files (like compaction or a bulk update...

  • 0 kudos
VaderKB
by New Contributor II
  • 538 Views
  • 7 replies
  • 0 kudos

Does too many parquet files in delta table impact writes for the streaming job

Hello,I am running a spark streaming job that reads data from AWS Kinesis and writes data to extrenal delta tables which are stored in S3. But I have noticed that over the time, the latency has been increasing. I also noticed that for each batch, the...

  • 538 Views
  • 7 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 0 kudos

Hello @VaderKB You're right that OPTIMIZE makes reads faster by reducing the number of files. For writes using append mode, it doesn't directly speed up the operation itself. However, having fewer, larger files from a previous OPTIMIZE run can improv...

  • 0 kudos
6 More Replies
ChrisLawford_n1
by Contributor
  • 374 Views
  • 2 replies
  • 0 kudos

Autoloader with file notifications on a queue that is on a different storage account to the blobs

Hello,I am trying to set up autoloader using file notifications but as the storage account we are reading from is a premium storage account we have setup event subscriptions to pump the blob events to queues that exist on a standard gen 2 storage acc...

  • 374 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @ChrisLawford_n1 So in your case, it's not able to resolve the file paths from the event notificationsbecause they're pointing to a different storage account (Storage Account 1), which is not associated with the queue.Use a StorageV2 Account for B...

  • 0 kudos
1 More Replies
Ritasree
by New Contributor II
  • 275 Views
  • 2 replies
  • 0 kudos

Unable to fetch 3rd level array using dot notation from a json data in local spark via VS Code

I am able to fetch 3rd level array using dot notation from a json in Databricks but same code is not working in local spark via VS Code.Example - df.select(F.col('x.y.z')) where x is array, y is array and z is also an array.In local spark I am gettin...

  • 275 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Ritasree ,When you're saying about local spark you mean that you've configured it locally, or are you using Databricks Connect? If you're configured spark locally, then check what version of Spark you're using.

  • 0 kudos
1 More Replies
rcostanza
by New Contributor III
  • 410 Views
  • 1 replies
  • 0 kudos

Lakeflow pipeline (formerly DLT pipeline) performance progressively degrades on a persistent cluster

I have a small (under 20 tables, all streaming) DLT pipeline running in triggered mode, scheduled every 15min during the workday.  For development I've set `pipelines.clusterShutdown.delay` to avoid having to start a cluster every update.I've noticed...

  • 410 Views
  • 1 replies
  • 0 kudos
Latest Reply
jerrygen78
New Contributor III
  • 0 kudos

You're right to be concerned — this sounds like a classic case of memory or resource leakage over time, which can affect long-running jobs even if metrics look okay on the surface. In triggered DLT (now Lakeflow) pipelines, tasksand state can accumul...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels