Data Engineering

Forum Posts

Sorted by:

by jeremy98 • Honored Contributor

08-12-2025 2:20:37 AM

613 Views
1 replies
0 kudos

how to manage a dynamic scheduled job if an INTERNAL_ERROR occurs?

Hi community,My team and I have been occasionally experiencing INTERNAL_ERROR events in Databricks. We have a job that runs on a schedule, but the start times vary. Sometimes, when the job is triggered, the underlying cluster fails to start for some ...

Data Engineering

613 Views
1 replies
0 kudos

08-12-2025 2:20:37 AM

View Replies

Latest Reply

SP_6721
Honored Contributor II

08-13-2025 5:20:23 AM

0 kudos

Hi @jeremy98 ,To investigate, check the Jobs UI for failed runs and review both error messages and cluster logs. Monitor failure trends over time and adjust cluster settings or quotas if needed.https://docs.databricks.com/gcp/en/jobs/repair-job-failu...

0 kudos

08-13-2025 5:20:23 AM

by ManojkMohan • Honored Contributor II

08-12-2025 2:14:47 PM

1696 Views
1 replies
1 kudos

Resolved! Notebook not found: Error

Last execution failed Notebook not found: Users/manojdatabricks73@gmail.com/includes/CreateRawData. Notebooks can be specified via a relative path (./Notebook or ../folder/Notebook) or via an absolute path (/Abs/Path/to/Notebook). Make sure you are s...

Data Engineering

1696 Views
1 replies
1 kudos

08-12-2025 2:14:47 PM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

08-12-2025 3:58:00 PM

1 kudos

Hi @ManojkMohan 1. Verify the exact notebook location in the workspaceIn Databricks, open the Workspace browser.Navigate manually to where you think CreateRawData lives.Right-click on the notebook and select Copy Path — this gives you the exact absol...

1 kudos

08-12-2025 3:58:00 PM

by databricks_use2 • New Contributor II

07-21-2025 5:23:56 PM

2279 Views
2 replies
2 kudos

Resolved! Autoloader Checkpoint Issue

I was pulling data from an S3 source using a Databricks Autoloader pipeline. Some files in the source contained bad characters, which caused the Autoloader to fail to load the data. These problematic files have now been removed from the source, but D...

Data Engineering

2279 Views
2 replies
2 kudos

07-21-2025 5:23:56 PM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

08-12-2025 3:20:48 PM

2 kudos

Hello @databricks_use2 If you are okay this please make this as solution so that this can help others.

2 kudos

08-12-2025 3:20:48 PM

1 More Replies

by grazie • Contributor

02-13-2023 6:07:29 AM

4036 Views
4 replies
3 kudos

Do you need to be workspace admin to create jobs?

We're using a setup where we use gitlab ci to deploy workflows using a service principal, using the Jobs API (2.1) https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreateWhen we wanted to reduce permissions of the ci to minimu...

Data Engineering

4036 Views
4 replies
3 kudos

02-13-2023 6:07:29 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:12:13 AM

3 kudos

Hi @Geir Iversen Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

3 kudos

04-10-2023 3:12:13 AM

3 More Replies

by susmitsircar • New Contributor III

08-12-2025 6:55:29 AM

1448 Views
3 replies
0 kudos

Spark streaming failing intermittently with FileAlreadyExistsException RocksDB checkpointing

We are encountering an issue in our Spark streaming pipeline when attempting to write checkpoint data to S3. The error we are seeing is as follows:25/08/12 13:35:40 ERROR RocksDBFileManager : Error zipping to s3://xxx-datalake-binary/event-types/chec...

Data Engineering

1448 Views
3 replies
0 kudos

08-12-2025 6:55:29 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

08-12-2025 9:00:09 AM

0 kudos

Hi @susmitsircar Best practices / Fixes1. Clean up the checkpoint directory before restartIf you know the stream can safely start from scratch or reprocess data:Delete the S3 checkpoint path before restarting.This ensures no stale 0.zip files remain....

0 kudos

08-12-2025 9:00:09 AM

2 More Replies

by susmitsircar • New Contributor III

07-28-2025 6:45:31 AM

2361 Views
7 replies
3 kudos

Resolved! Spark streaming failing intermittently with llegalStateException: Found no SST files

I'm encountering the following error while trying to upload a RocksDB checkpoint in Databricks:java.lang.IllegalStateException: Found no SST files during uploading RocksDB checkpoint version 498 with 2332 key(s). at com.databricks.sql.streaming.s...

Data Engineering

2361 Views
7 replies
3 kudos

07-28-2025 6:45:31 AM

View Replies

Latest Reply

susmitsircar
New Contributor III

08-12-2025 9:47:16 AM

3 kudos

@mani_22 Do you see any risk of disabling this flag in our pipeline, as we will be bypassing some heuristic checks, as far as i understand, while uploading the state filesspark.databricks.rocksDB.verifyBeforeUpload false

3 kudos

08-12-2025 9:47:16 AM

6 More Replies

by absan • Contributor

08-08-2025 8:23:57 AM

872 Views
1 replies
1 kudos

Resolved! Lakeflow Designer, DAB & Git

Hi, i'm trying to understand the process and configuration needed to get the new Lakeflow designer, DAB and Git Folder play together.What i've done:Created an empty Github repository and created a Git Folder for it in DatabricksIn the Git Folder i cr...

Data Engineering

872 Views
1 replies
1 kudos

08-08-2025 8:23:57 AM

View Replies

Latest Reply

SP_6721
Honored Contributor II

08-12-2025 7:57:44 AM

1 kudos

Hi @absan ,It’s recommended to create and deploy your DAB templates from within the Git folder, as this ensures the pipeline’s root is set correctly to that folder.

1 kudos

08-12-2025 7:57:44 AM

by noorbasha534 • Valued Contributor II

08-11-2025 2:49:09 PM

435 Views
1 replies
0 kudos

Column access patterns

Hello allFloating this question again separately ((few weeks ago I clubbed this as part of predictive optimization)) -Has anyone cracked to get the list if columns being used in joins & filters, especially in the context that access to end users is g...

Data Engineering

435 Views
1 replies
0 kudos

08-11-2025 2:49:09 PM

View Replies

Latest Reply

SP_6721
Honored Contributor II

08-12-2025 5:21:00 AM

0 kudos

Hi @noorbasha534 ,I’m not aware of a direct way to do this, but one approach is to parse each view’s SQL definition to identify the columns used in join and filter conditions, then use lineage tools to trace them through nested views back to the unde...

0 kudos

08-12-2025 5:21:00 AM

by yit • Databricks Partner

08-12-2025 4:39:15 AM

1214 Views
2 replies
0 kudos

Autoloader: Unexpected UnknownFieldException after streaming query termination

I am using Autoloader to ingest source data into Bronze layer Delta tables. The source files are JSON, and I rely on schema inference along with schema evolution (using mode: addNewColumns). To handle errors triggered by schema updates in the stream,...

Data Engineering

1214 Views
2 replies
0 kudos

08-12-2025 4:39:15 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-12-2025 5:00:46 AM

0 kudos

Hi @yit ,This is expected behaviour of Auto Loader with schema evolution enabled. Default mode is addNewColumns which causes stream fail. As documentation says:"Auto Loader detects the addition of new columns as it processes your data. When Auto Load...

0 kudos

08-12-2025 5:00:46 AM

1 More Replies

by ChristianRRL • Honored Contributor

08-11-2025 8:02:10 AM

547 Views
1 replies
1 kudos

Resolved! Thoughts on AutoLoader schema inferral into raw table (+data flattening)

I am curious to get the community's thoughts on this. Is it generally preferrable to load raw data based on its inferred columns or not? And is it preferred to keep the raw data in its original structure or to flatten it into a more tabular structure...

Data Engineering

547 Views
1 replies
1 kudos

08-11-2025 8:02:10 AM

View Replies

Latest Reply

SP_6721
Honored Contributor II

08-12-2025 4:10:38 AM

1 kudos

Hi @ChristianRRL ,When loading raw data into bronze tables with Auto Loader, it’s usually best to keep the original structure rather than flattening it right away. You can use schema inference for convenience, but to avoid mistakes, add schema hints ...

1 kudos

08-12-2025 4:10:38 AM

by MarkV • New Contributor III

01-29-2025 12:16:29 PM

3095 Views
8 replies
0 kudos

DLT, Automatic Schema Evolution and Type Widening

I'm attempting to run a DLT pipeline that uses automatic schema evolution against tables that have type widening enabled.I have code in this notebook that is a list of tables to create/update along with the schema for those tables. This list and spar...

Data Engineering

3095 Views
8 replies
0 kudos

01-29-2025 12:16:29 PM

View Replies

Latest Reply

abhic21
Databricks Partner

08-11-2025 10:21:13 PM

0 kudos

Is there any solution for type widening in DLT pipeline ? writeStream is not possible in DLT right ?@Sidhant07 @MarkV

0 kudos

08-11-2025 10:21:13 PM

7 More Replies

by lukasz_wybieral • Databricks Partner

08-12-2025 1:45:20 AM

1171 Views
2 replies
0 kudos

Specifying a serverless cluster for the dev environment in databricks.yml

Hey, I'm trying to find a way to specify a serverless cluster for the dev environment and job clusters for the test and prod environments in databricks.yml.The problem is that it seems impossible - I’ve tried many approaches, but the only outcomes I...

Data Engineering

1171 Views
2 replies
0 kudos

08-12-2025 1:45:20 AM

View Replies

Latest Reply

Nivethan_Venkat
Databricks MVP

08-12-2025 2:27:56 AM

0 kudos

Hi @lukasz_wybieral, It is not necessary to specify the cluster_config, if you would like to use serverless. Be default, Databricks picks the Serverless cluster if you don't specify the cluster configuration. Attaching below databricks.yml for your r...

0 kudos

08-12-2025 2:27:56 AM

1 More Replies

by minhhung0507 • Valued Contributor

08-11-2025 9:21:16 PM

793 Views
2 replies
0 kudos

Slow batch processing in Databricks job due to high deletion vector and unified unified cache overhe

We have a Databricks pipeline where the layer reads from several Silver tables to detect PK/FK changes and trigger updates to Gold tables. Normally, this near real-time job has ~3 minutes latency per micro-batch.Recently, we noticed that each batch i...

Data Engineering

793 Views
2 replies
0 kudos

08-11-2025 9:21:16 PM

View Replies

Latest Reply

noorbasha534
Valued Contributor II

08-12-2025 2:43:48 AM

0 kudos

@minhhung0507 as per documentation -'The actual physical removal of deleted rows (the "hard delete") is deferred until the table is optimized with OPTIMIZE or when a VACUUM operation is run, cleaning up old files.'So, based on this, try to optimize t...

0 kudos

08-12-2025 2:43:48 AM

1 More Replies

by MaximeGendre • New Contributor III

08-11-2025 5:26:42 AM

805 Views
3 replies
3 kudos

Resolved! Structure stream : difference Unity Catalog vs Legacy

Hello :),I have noticed a regression in one of my job and I don't understand why.%python print("Hello 1") def toto(df, _): print("Hello 2") spark.readStream\ .format("delta")\ .load("/databricks-datasets/nyctaxi/tables/nyctaxi_yellow...

Data Engineering

805 Views
3 replies
3 kudos

08-11-2025 5:26:42 AM

View Replies

Latest Reply

MaximeGendre
New Contributor III

08-12-2025 12:53:34 AM

3 kudos

Hi @szymon_dybczak,thanks a lot for the quick and accurate answer I forgot that there was this limitation.

3 kudos

08-12-2025 12:53:34 AM

2 More Replies

by RIDBX • Contributor

08-08-2025 10:01:18 AM

837 Views
5 replies
0 kudos

Lake Bridge ETL Rehouse into AWS Data bricks options ?

Lake Bridge ETL Rehouse into AWS Data bricks options ?==========================================Hi Community experts?Thanks for replies to my threads.We reviewed the Lake Bridge thread opened here. The functionality claimed, it can convert on-prem ET...

Data Engineering

837 Views
5 replies
0 kudos

08-08-2025 10:01:18 AM

View Replies

Latest Reply

RIDBX
Contributor

08-08-2025 2:08:40 PM

0 kudos

Thanks for weighing in. For the same question in another data engineering discussion board not giving a comfort feeling about this . They project a nightmare scenarios.

0 kudos

08-08-2025 2:08:40 PM

4 More Replies

Databricks Community

Forum Posts

how to manage a dynamic scheduled job if an INTERNAL_ERROR occurs?

Resolved! Notebook not found: Error

Resolved! Autoloader Checkpoint Issue

Do you need to be workspace admin to create jobs?

Spark streaming failing intermittently with FileAlreadyExistsException RocksDB checkpointing

Resolved! Spark streaming failing intermittently with llegalStateException: Found no SST files

Resolved! Lakeflow Designer, DAB & Git

Column access patterns

Autoloader: Unexpected UnknownFieldException after streaming query termination

Resolved! Thoughts on AutoLoader schema inferral into raw table (+data flattening)

DLT, Automatic Schema Evolution and Type Widening

Specifying a serverless cluster for the dev environment in databricks.yml

Slow batch processing in Databricks job due to high deletion vector and unified unified cache overhe

Resolved! Structure stream : difference Unity Catalog vs Legacy

Lake Bridge ETL Rehouse into AWS Data bricks options ?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template