cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yit
by Contributor III
  • 631 Views
  • 2 replies
  • 0 kudos

Autoloader: Unexpected UnknownFieldException after streaming query termination

I am using Autoloader to ingest source data into Bronze layer Delta tables. The source files are JSON, and I rely on schema inference along with schema evolution (using mode: addNewColumns). To handle errors triggered by schema updates in the stream,...

  • 631 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @yit ,This is expected behaviour of Auto Loader with schema evolution enabled. Default mode is addNewColumns which causes stream fail. As documentation says:"Auto Loader detects the addition of new columns as it processes your data. When Auto Load...

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor III
  • 395 Views
  • 1 replies
  • 1 kudos

Resolved! Thoughts on AutoLoader schema inferral into raw table (+data flattening)

I am curious to get the community's thoughts on this. Is it generally preferrable to load raw data based on its inferred columns or not? And is it preferred to keep the raw data in its original structure or to flatten it into a more tabular structure...

ChristianRRL_0-1754923823715.png
  • 395 Views
  • 1 replies
  • 1 kudos
Latest Reply
SP_6721
Honored Contributor
  • 1 kudos

Hi @ChristianRRL ,When loading raw data into bronze tables with Auto Loader, it’s usually best to keep the original structure rather than flattening it right away. You can use schema inference for convenience, but to avoid mistakes, add schema hints ...

  • 1 kudos
MarkV
by New Contributor III
  • 2261 Views
  • 8 replies
  • 0 kudos

DLT, Automatic Schema Evolution and Type Widening

I'm attempting to run a DLT pipeline that uses automatic schema evolution against tables that have type widening enabled.I have code in this notebook that is a list of tables to create/update along with the schema for those tables. This list and spar...

  • 2261 Views
  • 8 replies
  • 0 kudos
Latest Reply
abhic21
New Contributor II
  • 0 kudos

Is there any solution for type widening in DLT pipeline ? writeStream is not possible in DLT right ?@Sidhant07 @MarkV  

  • 0 kudos
7 More Replies
lukasz_wybieral
by New Contributor II
  • 702 Views
  • 2 replies
  • 0 kudos

Specifying a serverless cluster for the dev environment in databricks.yml

 Hey, I'm trying to find a way to specify a serverless cluster for the dev environment and job clusters for the test and prod environments in databricks.yml.The problem is that it seems impossible - I’ve tried many approaches, but the only outcomes I...

  • 702 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 0 kudos

Hi @lukasz_wybieral, It is not necessary to specify the cluster_config, if you would like to use serverless. Be default, Databricks picks the Serverless cluster if you don't specify the cluster configuration. Attaching below databricks.yml for your r...

  • 0 kudos
1 More Replies
minhhung0507
by Valued Contributor
  • 601 Views
  • 2 replies
  • 0 kudos

Slow batch processing in Databricks job due to high deletion vector and unified unified cache overhe

We have a Databricks pipeline where the layer reads from several Silver tables to detect PK/FK changes and trigger updates to Gold tables. Normally, this near real-time job has ~3 minutes latency per micro-batch.Recently, we noticed that each batch i...

  • 601 Views
  • 2 replies
  • 0 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 0 kudos

@minhhung0507 as per documentation -'The actual physical removal of deleted rows (the "hard delete") is deferred until the table is optimized with OPTIMIZE or when a VACUUM operation is run, cleaning up old files.'So, based on this, try to optimize t...

  • 0 kudos
1 More Replies
MaximeGendre
by New Contributor III
  • 530 Views
  • 3 replies
  • 3 kudos

Resolved! Structure stream : difference Unity Catalog vs Legacy

Hello :),I have noticed a regression in one of my job and I don't understand why.%python print("Hello 1") def toto(df, _): print("Hello 2") spark.readStream\ .format("delta")\ .load("/databricks-datasets/nyctaxi/tables/nyctaxi_yellow...

  • 530 Views
  • 3 replies
  • 3 kudos
Latest Reply
MaximeGendre
New Contributor III
  • 3 kudos

Hi @szymon_dybczak,thanks a lot for the quick and accurate answer I forgot that there was this limitation.

  • 3 kudos
2 More Replies
matmad
by New Contributor III
  • 815 Views
  • 3 replies
  • 1 kudos

Resolved! Job fails on clusters only with library dependency

Hello!I have following problem: All my job runs fail when the job uses a library. Even the most basic job (print a string) and the most basic library package (no secondary dependencies, the script does not even import/use the library) fails with `Fai...

matmad_0-1754491920584.png matmad_1-1754492300025.png matmad_2-1754492381339.png
  • 815 Views
  • 3 replies
  • 1 kudos
Latest Reply
matmad
New Contributor III
  • 1 kudos

I think I found a (the?) solution. The cluster tried to connect to the legacy Hive Catalog, so I* set the default catalog for the workspace to the proper catalog* disabled "Legacy access"These steps solve my `DriverError`. This log4j error message ga...

  • 1 kudos
2 More Replies
RIDBX
by Contributor
  • 604 Views
  • 5 replies
  • 0 kudos

Lake Bridge ETL Rehouse into AWS Data bricks options ?

Lake Bridge ETL Rehouse into AWS Data bricks options ?==========================================Hi Community experts?Thanks for replies to my threads.We reviewed the Lake Bridge thread opened here. The functionality claimed, it can convert on-prem ET...

  • 604 Views
  • 5 replies
  • 0 kudos
Latest Reply
RIDBX
Contributor
  • 0 kudos

Thanks for weighing in. For the same question in another data engineering discussion board not giving  a comfort feeling about this . They project a nightmare scenarios. 

  • 0 kudos
4 More Replies
dimsh
by Contributor
  • 21338 Views
  • 14 replies
  • 10 kudos

How to overcome missing query parameters in Databricks SQL?

Hi, there! I'm trying to build up my first dashboard based on Dataabricks SQL. As far as I can see if you define a query parameter you can't skip it further. I'm looking for any option where I can make my parameter optional. For instance, I have a ta...

  • 21338 Views
  • 14 replies
  • 10 kudos
Latest Reply
theslowturtle
New Contributor II
  • 10 kudos

Hello guys, I'm not sure if you could solve this issue but here is how I've handled it:SELECT *FROM my_tableWHERE (CASE WHEN LEN(:my_parameter) > 0 THEN my_column = :my_parameter ELSE my_column = my_column END)I hope this can help!

  • 10 kudos
13 More Replies
SugathithyanM
by New Contributor
  • 503 Views
  • 1 replies
  • 2 kudos

Resolved! Reg. virtual learning festival coupon

Hi team, I've attended DAIS 2025 Virtual Learning Festival (11 June - 2 July) and received coupon.1. Does the coupon applicable for 'Databricks certified associate developer for apache spark' as well?2. I'm preparing to take exam for spark certificat...

  • 503 Views
  • 1 replies
  • 2 kudos
Latest Reply
Jim_Anderson
Databricks Employee
  • 2 kudos

Hey @SugathithyanM thanks for the additional mention here, please see our conversation for reference. For any others also interested: 1. Yes, the certification voucher code is applicable on any Databricks Certification exam, including the Apache Spar...

  • 2 kudos
Manjula_Ganesap
by Contributor
  • 853 Views
  • 1 replies
  • 0 kudos

Autoloader on ADLS blobs with archival enabled

Hi All, I'm trying to change our Ingestion process to use Autoloader to identify new files landing in a directory on ADLS. The ADLS directory has access tier enabled to archive files older than a certain time period. When I'm trying to set up Autoloa...

  • 853 Views
  • 1 replies
  • 0 kudos
Latest Reply
Steffen
New Contributor III
  • 0 kudos

Facing the same issue when trying to use autoloader with useNotifications. Did you ever found a workaround?

  • 0 kudos
seefoods
by Valued Contributor
  • 834 Views
  • 1 replies
  • 1 kudos

asset bundle

Hello Guys,Its possible to use asset bundle databricks to set tbproperties of table?Cordially, 

  • 834 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @seefoods ,There is no out of the box to do this in DAB. Look at supported resources that you can configure with DABs:Databricks Asset Bundles resources | Databricks DocumentationAs a workaround, you can just create notebook that will set tblprope...

  • 1 kudos
anujsen18
by New Contributor III
  • 1419 Views
  • 9 replies
  • 1 kudos

Resolved! View Failing due to filed not recognise

Hi, I am facing problem where my existing view has stopped working due to unrecognised filed which is a alias field .I  am using the same definition  and spark configuration.DBR : 14.3 LTS, Spark 3.5.0Any one faced similar problem recently ?  

  • 1419 Views
  • 9 replies
  • 1 kudos
Latest Reply
anujsen18
New Contributor III
  • 1 kudos

HI @Khaja_Zaffer yes it was actaully issue with privacera version . after invastaigation with privacera team they suggested to upgrade the privacera version to 9.0.35.1.

  • 1 kudos
8 More Replies
Malthe
by Contributor II
  • 677 Views
  • 1 replies
  • 2 kudos

Driver terminated abnormally due to FORCE_KILL

We have a job running on a job cluster where sometimes the driver dies:> The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.But the metrics don't suggest an explanation for this situation.In th...

  • 677 Views
  • 1 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

That error is usually related to driver load. Try upsizing the driver one size and see if it still happens. Otherwise, for troubleshooting, driver problems are surfaced to the cluster's event log, like DRIVER_NOT_RESPONDING and DRIVER_UNAVAILABLE. Yo...

  • 2 kudos
nopal1
by New Contributor II
  • 531 Views
  • 2 replies
  • 2 kudos

Resolved! Python os.listdir() behavior difference between 15.4LTS and 16.4LTS DBRs

We found that when using os.listdir() in Databricks notebooks to list files stored in the Workspace (i.e., alongside the notebook, not in DBFS), file extensions were missing in Databricks Runtime 14.3 LTS and 15.4 LTS, but appeared correctly in 16.4 ...

  • 531 Views
  • 2 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

This is expected and changed in DBR16.2: In Databricks Runtime 16.2 and above, notebooks are supported as workspace files. 

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels