Databricks Community

thibault · ‎05-13-2025

Hi, I've been running a job on Azure Databricks serverless, which just does some batch data processing every 4 hours. This job, deployed with bundles has been running fine for weeks, and all of a sudden, yesterday, it started failing with an error that doesn't point to any real line in the source files. This time it points to this file, another run points to another one.

Running on a job compute doesn't generate this error.

Any idea of the cause of this? This makes many jobs fail in one workspace, whereas the same jobs in another workspace, same config, run fine.

thibault · ‎05-16-2025

@Shua42 , strange thing, all serverless tests started passing again today, so I redeployed all bundles as serverless jobs, and it is working again. Does this sound related to a bug Databricks found and fixed this week?

View solution in original post

Shua42 · ‎05-16-2025

Hey @thibault ,

Glad to hear it is working again. I don't see any specific mention of a bug internally that would be related to this, but it is likely that it was due to a change in the underlying runtime for serverless compute.

This may be one of the tradeoffs you should consider with serverless vs. standard jobs compute. The lack of a need to manage the environment with serverless does help reduce the maintenance overhead, but could lead to inconsistent dependency issues with your code as you don't have as much control over the environment.

View solution in original post

Shua42 · ‎05-14-2025

Hey @thibault ,

One possibility is that this could be due to an update of the underlying Databricks runtime for serverless compute, which could have affected a dependency and is now causing differing behavior.

It's hard to say without knowing what the data and code looks like, but I think it would also be good to double check that there isn't a data issue that could have caused this.

My recommendation for now would be to run it with job compute as it's a fixed runtime, and try to debug each task to get a better sense of what specific logic is causing the failure. If there is a strict dependency issue, job compute may be a better option for you.

thibault · ‎05-15-2025

Hi @Shua42 , I am switching back to job compute for now in prod.

The exact same jobs, reading the same data from UC, just in 2 different workspaces, and the one in the dev workspace runs just fine, whereas the one in prod is failing. Also, the error seems inconsistent, it complains about a non existing line of code from an empty __init__.py file that looks like a log timestamp, and another job is failing due to a seemingly a circular import. This all happened overnight with the latest code changes happening weeks ago.

I'll file a bug as this seems unrelated to our setup.

thibault · ‎05-15-2025

@Shua42 , I was able to reproduce the error running a notebook from the bundle file structure.

The interesting thing is that if I clone the whole content of the folder under .bundle, and run the notebook from that new structure, it no longer fails.

Deleting the bundle and redeploying does not help, and renaming the clone re-triggers the error. Not sure if that helps, but I'll keep testing things out.

thibault · ‎05-16-2025

@Shua42 , strange thing, all serverless tests started passing again today, so I redeployed all bundles as serverless jobs, and it is working again. Does this sound related to a bug Databricks found and fixed this week?

Shua42 · ‎05-16-2025

Hey @thibault ,

Glad to hear it is working again. I don't see any specific mention of a bug internally that would be related to this, but it is likely that it was due to a change in the underlying runtime for serverless compute.

This may be one of the tradeoffs you should consider with serverless vs. standard jobs compute. The lack of a need to manage the environment with serverless does help reduce the maintenance overhead, but could lead to inconsistent dependency issues with your code as you don't have as much control over the environment.

Databricks Community

Databricks Serverless Job : sudden random failure

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST