cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

carlos_tasayco
by Contributor
  • 1522 Views
  • 3 replies
  • 0 kudos

Resolved! flattening json in dlt pipeline

Hi,I have in my bronze schema json files, I am flattening them in a dataframe after that I am creating materialized views in a dlt pipeline, however, in production is taking a lot of time (over 3 hours) is not even a lot of data the biggest materiali...

  • 1522 Views
  • 3 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @carlos_tasayco  As I mentioned whether you used any join, just wanted to ask, were this cross join were joining two large tables?

  • 0 kudos
2 More Replies
Pratikmsbsvm
by Contributor
  • 1515 Views
  • 3 replies
  • 0 kudos

Databricks Workflow Orchestration for Pipeline

Hello,I am using Databricks first time. May someone please help me how to do orchestration for the pipeline shown below.Kindly share the steps how to implement Orchestration , what all steps we have to consider.Thanks a lot   

Pratikmsbsvm_0-1753936635561.png
  • 1515 Views
  • 3 replies
  • 0 kudos
Latest Reply
junaid-databrix
New Contributor III
  • 0 kudos

The diagram you have shared is bit confusing: From Azure there is a data pull to Bronze layer, and from the same data source data is being pulled into Silver layer. However, following the Medallion architecture typically the raw data is ingested into...

  • 0 kudos
2 More Replies
ashokpola1
by New Contributor III
  • 4538 Views
  • 7 replies
  • 5 kudos

Resolved! Are there any student discounts or coupons for the Databricks Data Engineer Associate certification

I’m planning to take the Databricks Data Engineer Associate certification exam, and I wanted to ask if there are any official discounts, coupons, or student offers available to help reduce the exam fee.I’m a student right now, so any discount or prom...

  • 4538 Views
  • 7 replies
  • 5 kudos
Latest Reply
ashokpola1
New Contributor III
  • 5 kudos

thank you

  • 5 kudos
6 More Replies
my_super_name
by New Contributor III
  • 3722 Views
  • 3 replies
  • 4 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

  • 3722 Views
  • 3 replies
  • 4 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 4 kudos

Hi, we are having similar issues with schema hints formulated in fully qualified DDL, e.g. "a STRUCT<b INT>" etc. Did you find a solution? Also, did you specify the schema hint using the dot-notation, e.g. "a.b INT" before ingesting any data or after...

  • 4 kudos
2 More Replies
pooja_bhumandla
by Databricks Partner
  • 946 Views
  • 1 replies
  • 0 kudos

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

Hi Databricks Community,I’m analyzing the performance of Delta Lake MERGE operations on a partitioned table, and I observed unexpected behavior across 3 test cases.I wanted to share my findings to better understand:Why ZORDER or Deletion Vectors help...

  • 946 Views
  • 1 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @pooja_bhumandla Thanks for such a nice and detailed description of Your case, that really helps to understand the scenario Regarding Your questions:1)Overall operation could become more complex due to:a) deletion vector creation and maintenance,b...

  • 0 kudos
noorbasha534
by Valued Contributor II
  • 1913 Views
  • 1 replies
  • 2 kudos

Delta Lake File Sizes - optimize maxfilesize, tunefilesizesforrewrites

hello allwhile reading content that provides guidance on delta lake file size, i realized tuneFileSizesForRewrites behind the scenes targets for 256 MB file size.and optimize.maxFileSize will target for 1 GB file ((reference : https://docs.databricks...

  • 1913 Views
  • 1 replies
  • 2 kudos
Latest Reply
radothede
Valued Contributor II
  • 2 kudos

hello @noorbasha534 That's very interesting topic regarding fine-tuning file sizes under delta table.Answering Your questions:1)I use spark.databricks.delta.optimize.maxFileSize and set maximum file size for optimize command. Its working for me just ...

  • 2 kudos
Bank_Kirati
by New Contributor III
  • 1107 Views
  • 2 replies
  • 3 kudos

Resolved! Bug Report: SQL Editor, Run Now button run query from different query file/tab

When I click Run Now button or press (Command + Enter), the editor somehow executed query from another query file/tab. I can only use Run Selected for now. Clear cache and re-login didn't solve the problem.Please see the screen recording herehttps://...

  • 1107 Views
  • 2 replies
  • 3 kudos
Latest Reply
Advika
Community Manager
  • 3 kudos

Hello @Bank_Kirati! Thanks for sharing the screen recording. I’m unable to reproduce the issue on my end. Could you try toggling the New SQL Editor off and on, and see if that makes a difference? Also, please check if this happens only with specific ...

  • 3 kudos
1 More Replies
Nidhig
by Databricks Partner
  • 783 Views
  • 1 replies
  • 1 kudos

Partner Academy Login Issue

Hi Team, I  mam facing the issue with the partner academy login.

  • 783 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @Nidhig! From the screenshot, it appears you're not currently authorized to access the Partner Academy. In this case please raise a ticket with the Databricks Support Team. They’ll be able to investigate further and assist you with gaining acce...

  • 1 kudos
Pratikmsbsvm
by Contributor
  • 1747 Views
  • 3 replies
  • 1 kudos

Resolved! Error Logging and Orchastration In Databricks

Hello,May someone please Help me designing the Error Logging and How to do orchestration for Pipeline below.I am pulling data from Bronze layer and pushing it to silver layer after transformation.1. How to do Error Logging and where to store2. How to...

Pratikmsbsvm_0-1753867667783.png
  • 1747 Views
  • 3 replies
  • 1 kudos
Latest Reply
Pratikmsbsvm
Contributor
  • 1 kudos

@Brahmareddy : Thanks a lot. do you have any page which shows real implementation, if handy. Kindly share.

  • 1 kudos
2 More Replies
pooja_bhumandla
by Databricks Partner
  • 704 Views
  • 2 replies
  • 0 kudos

Will Unsetting delta.targetFileSize During Data Load Cause Any Issues?

Hi,If I unset the Spark config delta.targetFileSize (e.g., using alter) while a data load is in progress (batch or streaming), will it cause any issues?Will the load fail or behave inconsistently due to the config being changed mid-process?Thanks!

  • 704 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi pooja_bhumandla,How are you doing today? In general, changing the delta.targetFileSize config while a batch or streaming load is in progress won’t crash your job, but it may lead to inconsistent behavior during that specific run. Spark jobs usuall...

  • 0 kudos
1 More Replies
jeanptello
by New Contributor
  • 1435 Views
  • 1 replies
  • 0 kudos

Read Snowflake Iceberg tables from Databricks UC

Hi folks!I'm trying to read Iceberg tables that I created in Snowflake from Databricks using catalog federation. I set up a connection to Snowflake, configured an external location pointing to the S3 folder that contains the Iceberg files, and used t...

Data Engineering
Catalog Federation
Iceberg
  • 1435 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @jeanptello It’s possible that there’s a mismatch between what Snowflake has written and what Databricks is trying to read. This often happens if Snowflake has performed an operation that rewrites the table files (like compaction or a bulk update...

  • 0 kudos
VaderKB
by New Contributor II
  • 1842 Views
  • 7 replies
  • 0 kudos

Does too many parquet files in delta table impact writes for the streaming job

Hello,I am running a spark streaming job that reads data from AWS Kinesis and writes data to extrenal delta tables which are stored in S3. But I have noticed that over the time, the latency has been increasing. I also noticed that for each batch, the...

  • 1842 Views
  • 7 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @VaderKB You're right that OPTIMIZE makes reads faster by reducing the number of files. For writes using append mode, it doesn't directly speed up the operation itself. However, having fewer, larger files from a previous OPTIMIZE run can improv...

  • 0 kudos
6 More Replies
ChrisLawford_n1
by Contributor II
  • 1319 Views
  • 2 replies
  • 0 kudos

Autoloader with file notifications on a queue that is on a different storage account to the blobs

Hello,I am trying to set up autoloader using file notifications but as the storage account we are reading from is a premium storage account we have setup event subscriptions to pump the blob events to queues that exist on a standard gen 2 storage acc...

  • 1319 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @ChrisLawford_n1 So in your case, it's not able to resolve the file paths from the event notificationsbecause they're pointing to a different storage account (Storage Account 1), which is not associated with the queue.Use a StorageV2 Account for B...

  • 0 kudos
1 More Replies
Ritasree
by Databricks Partner
  • 573 Views
  • 2 replies
  • 0 kudos

Unable to fetch 3rd level array using dot notation from a json data in local spark via VS Code

I am able to fetch 3rd level array using dot notation from a json in Databricks but same code is not working in local spark via VS Code.Example - df.select(F.col('x.y.z')) where x is array, y is array and z is also an array.In local spark I am gettin...

  • 573 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Ritasree ,When you're saying about local spark you mean that you've configured it locally, or are you using Databricks Connect? If you're configured spark locally, then check what version of Spark you're using.

  • 0 kudos
1 More Replies
thbeh_com
by New Contributor III
  • 1136 Views
  • 2 replies
  • 0 kudos

Resolved! Legacy hive_metatstore corruption

I am seeing some legacy hive_metastore corruption (especially tables created as parquet instead of Delta) lately in my client's place, who is in the midst of migrating to UC. We were provided with a Scala code to remove the erroneous Parquet files ph...

  • 1136 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

HI @thbeh_com Yes, this is a fairly common issue during UC migrations, especially with legacy Hive metastore tables. The corruption typically happens because:- Metadata-data misalignment - Hive metastore references files that no longer exist or have ...

  • 0 kudos
1 More Replies
Labels