Data Engineering

Forum Posts

Sorted by:

by Danish11052000 • Contributor

11-04-2025 9:56:03 AM

1344 Views
2 replies
0 kudos

Looking for Advice: Robust Backup Strategy for Databricks System Tables

HI,I’m planning to build a backup system for all Databricks system tables (audit, usage, price, history, etc.) to preserve data beyond retention limits. Currently, I’m using Spark Streaming with readStream + writeStream and checkpointing in LakeFlow ...

Data Engineering

1344 Views
2 replies
0 kudos

11-04-2025 9:56:03 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:09:01 AM

0 kudos

Greetings @Danish11052000 , here’s a pragmatic way to choose, based on the nature of Databricks system tables and the guarantees you want. Bottom line For ongoing replication to preserve data beyond free retention, a Lakeflow Declarative Pipeline w...

0 kudos

11-04-2025 10:09:01 AM

1 More Replies

by aiohi • Databricks Partner

11-04-2025 1:32:52 PM

316 Views
1 replies
0 kudos

Resolved! Claude Access to Workspace and Catalog

I have a question, if we have a Claude corporate account are we able to link that directly to the Playground of Databricks? So that we would not have to add files separately that are already available in our workspace or catalog.

Data Engineering

316 Views
1 replies
0 kudos

11-04-2025 1:32:52 PM

View Replies

Latest Reply

MuthuLakshmi
Databricks Employee

11-04-2025 8:39:43 PM

0 kudos

@aiohi yes you should be able to access the files available. https://www.databricks.com/blog/anthropic-claude-37-sonnet-now-natively-available-databricks https://support.claude.com/en/articles/12430928-using-databricks-for-data-analysis Docs for your...

0 kudos

11-04-2025 8:39:43 PM

by Mathias_Peters • Contributor II

11-03-2025 6:56:31 AM

386 Views
1 replies
1 kudos

Resolved! Reading MongoDB collections into an RDD

Hi, for a Spark job which does some custom computation, I need to access data from a MongoDB collection and access the elements as of type Document. The reason for this is, that I want to apply some custom type serialization which is already implemen...

Data Engineering

386 Views
1 replies
1 kudos

11-03-2025 6:56:31 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:33:19 AM

1 kudos

Greeting @Mathias_Peters , here are some suggestions for your consideration. Analysis You're encountering a common challenge when migrating to newer versions of the MongoDB Spark Connector. The architecture changed significantly between versions 2.x ...

1 kudos

11-04-2025 10:33:19 AM

by pooja_bhumandla • Databricks Partner

11-04-2025 6:10:33 AM

1132 Views
1 replies
1 kudos

Resolved! Broadcast Join Failure in Streaming: Failed to store executor broadcast in BlockManager

Hi Databricks Community,I’m running a Structured Streaming job in Databricks with foreachBatch writing to a Delta table.Failed to store executor broadcast spark_join_relation_1622863(size = Some(67141632)) in BlockManager with storageLevel=StorageLev...

Data Engineering

1132 Views
1 replies
1 kudos

11-04-2025 6:10:33 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:22:22 AM

1 kudos

Greetings @pooja_bhumandla , here are some helpful hints and tips. Diagnosis Your error indicates that a broadcast join operation is attempting to send ~64MB of data to executors, but the BlockManager cannot store it due to memory constraints. This c...

1 kudos

11-04-2025 10:22:22 AM

by pabloratache • New Contributor III

11-03-2025 4:18:23 PM

826 Views
4 replies
5 kudos

Resolved! [FREE TRIAL] Missing All-Purpose Clusters Access - New Account

Issue Description: I created a new Databricks Free Trial account ("For Work" plan with $400 credits) but I don't have access to All-Purpose Clusters or PySpark compute. My workspace only shows SQL-only features.Current Setup:- Account Email: ronel.ra...

Data Engineering

826 Views
4 replies
5 kudos

11-03-2025 4:18:23 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:02:33 AM

5 kudos

Ah, got it @pabloratache , I did some digging and here is what I found (learned a few things myself). Thanks for the detailed context — this behavior is expected for the current Databricks 14‑day Free Trial (“For Work” plan). What’s happening with ...

5 kudos

11-04-2025 10:02:33 AM

3 More Replies

by SahiSammu • New Contributor II

11-04-2025 4:24:05 AM

1366 Views
2 replies
0 kudos

Resolved! Auto Loader vs Batch for Large File Loads

Hi everyone,I'm seeing a dramatic difference in processing times between batch and streaming (Auto Loader) approaches for reading about 250,000 files from S3 in Databricks. My goal is to read metadata from these files and register it as a table (even...

Data Engineering

autoloader

Directory Listing

ingestion

1366 Views
2 replies
0 kudos

11-04-2025 4:24:05 AM

View Replies

Latest Reply

SahiSammu
New Contributor II

11-04-2025 10:04:41 AM

0 kudos

Thank you, Anudeep.I plan to tune Auto Loader by increasing the maxFilesPerTrigger parameter to optimize performance. My decision to use Auto Loader is primarily driven by its built-in backup functionality via cloudFiles.cleanSource.moveDestination, ...

0 kudos

11-04-2025 10:04:41 AM

1 More Replies

by noorbasha534 • Valued Contributor II

03-09-2025 4:36:59 PM

4011 Views
1 replies
0 kudos

Databricks Jobs Failure Notification to Azure DevOps as incident

Dear all,Has anyone tried sending Databricks Jobs Failure Notification to Azure DevOps as incident? I see webhook as a OOTB destination for jobs. I am thinking to leverage it. But, like to hear any success stories of it or any other smart approaches....

Data Engineering

4011 Views
1 replies
0 kudos

03-09-2025 4:36:59 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-04-2025 10:00:25 AM

0 kudos

Yes, there are successful approaches and best practices for sending Databricks Job Failure notifications to Azure DevOps as incidents, primarily by leveraging the webhook feature as an out-of-the-box (OOTB) destination in Databricks Jobs. The workflo...

0 kudos

11-04-2025 10:00:25 AM

by aonurdemir • Contributor

10-27-2025 2:37:18 AM

740 Views
3 replies
5 kudos

Resolved! Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

Data Engineering

740 Views
3 replies
5 kudos

10-27-2025 2:37:18 AM

View Replies

Latest Reply

K_Anudeep
Databricks Employee

10-29-2025 9:56:41 AM

5 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

5 kudos

10-29-2025 9:56:41 AM

2 More Replies

by somedeveloper • New Contributor III

11-22-2024 8:30:27 AM

1299 Views
2 replies
1 kudos

Modifying size of /var/lib/lxc

Good morning,When running a library (sparkling water) for a very large dataset, I've noticed that during an export procedure the /var/lib/lxc storage is being used. Since the storage seems to be at a static 130GB of memory, this is a problem because ...

Data Engineering

1299 Views
2 replies
1 kudos

11-22-2024 8:30:27 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

11-22-2024 11:32:32 AM

1 kudos

Unfortunately this is a setting that cannot be increased on customer side

1 kudos

11-22-2024 11:32:32 AM

1 More Replies

by VikasSinha • New Contributor

07-13-2022 11:45:55 PM

8183 Views
5 replies
0 kudos

Which is better - Azure Databricks or GCP Databricks?

Which cloud hosting environment is best to use for Databricks? My question pins down to the fact that there must be some difference between the latency, throughput, result consistency & reproducibility between different cloud hosting environments of ...

Data Engineering

8183 Views
5 replies
0 kudos

07-13-2022 11:45:55 PM

View Replies

Latest Reply

bidek56
Contributor

11-04-2025 4:23:53 AM

0 kudos

@VikasSinha Databricks is not stable regardless of the cloud, jobs and clusters keep crashing. Use Polars or Duckdb instead.

0 kudos

11-04-2025 4:23:53 AM

4 More Replies

by Danish11052000 • Contributor

11-03-2025 2:06:36 AM

392 Views
1 replies
1 kudos

Resolved! Missing warehouse id/metadata for the system compute warehouse table

I ran the following queries for a specific warehouse_id = '54a93d2138433216' SELECT * FROM system.billing.usage WHERE usage_metadata.warehouse_id = '54a93d2138433216';SELECT * FROM system.compute.warehouse_events WHERE warehouse_id = '54a93d213843321...

Data Engineering

392 Views
1 replies
1 kudos

11-03-2025 2:06:36 AM

View Replies

Latest Reply

MuthuLakshmi
Databricks Employee

11-04-2025 3:51:52 AM

1 kudos

@Danish11052000 The issue can happen when the warehouse created is very old (let's say last year), then you may not see the details in system table. If they were deleted before the table was created, you'll not see it.

1 kudos

11-04-2025 3:51:52 AM

by bianca_unifeye • Databricks MVP

11-03-2025 11:25:26 AM

944 Views
1 replies
2 kudos

Resolved! Webinars

Hi! My colleagues and I at Unifeye are hosting a series of regular webinars focused on Databricks content. In November, we’re running four sessions covering Geospatial, Governance, AI, and Delta Sharing, featuring Databricks architects as guest speak...

Data Engineering

944 Views
1 replies
2 kudos

11-03-2025 11:25:26 AM

View Replies

Latest Reply

Advika
Community Manager

11-04-2025 3:39:33 AM

2 kudos

Hello @bianca_unifeye! Thanks for sharing! This looks like a great initiative.For better visibility and engagement, please go ahead and post about the Webinar series directly in the Community as a post. Members can then register and join through your...

2 kudos

11-04-2025 3:39:33 AM

by bianca_unifeye • Databricks MVP

11-04-2025 3:21:09 AM

371 Views
0 replies
1 kudos

Webinar: Geospatial Data Ingestion and Manipulation on Databricks

Geospatial Data Meets Databricks + Felt: Turning Coordinates into Business InsightMost organisations capture huge volumes of spatial data — addresses, coordinates, routes, catchments — but struggle to operationalise it at scale. Traditional GIS tool...

Data Engineering

geospatial

webinar

371 Views
0 replies
1 kudos

11-04-2025 3:21:09 AM

by Abdul_Alikhan • New Contributor II

06-17-2025 7:54:11 PM

2138 Views
5 replies
3 kudos

Resolved! in data bricks free edition Serverless compute is not working

I recently logged into the Databricks free edition, but the serverless compute is not working. I'm receiving the error: 'An error occurred while trying to attach serverless compute. Please try again or contact support.'"

Data Engineering

2138 Views
5 replies
3 kudos

06-17-2025 7:54:11 PM

View Replies

Latest Reply

LonaOsmani
New Contributor III

06-20-2025 4:11:30 AM

3 kudos

Hi @Abdul_Alikhan ,I experienced the same yesterday when I imported some of my notebooks. I noticed that this error only appeared for imported notebooks because the environment version was 1 by default. Changing the environment version to 2 solved th...

3 kudos

06-20-2025 4:11:30 AM

4 More Replies

by FarhanM • New Contributor II

11-04-2025 12:38:22 AM

987 Views
1 replies
1 kudos

Resolved! Databricks Streaming: Recommended Cluster Types and Best Practices

Hi Community, I recently built some streaming pipelines (Autoloader-based) that extract JSON data from the Data Lake and, after parsing and logging, dump it into the Delta Lake bronze layer. Since these are streaming pipelines, they are supposed to r...

Data Engineering

987 Views
1 replies
1 kudos

11-04-2025 12:38:22 AM

View Replies

Latest Reply

bianca_unifeye
Databricks MVP

11-04-2025 12:56:53 AM

1 kudos

When running streaming pipelines, the key is to design for stability and isolation, not to rely on restart jobs.The first thing to do is run your streams on Jobs Compute, not All-Purpose clusters. If available, use Serverless Jobs. Each pipeline shou...

1 kudos

11-04-2025 12:56:53 AM

Databricks Community

Forum Posts

Looking for Advice: Robust Backup Strategy for Databricks System Tables

Resolved! Claude Access to Workspace and Catalog

Resolved! Reading MongoDB collections into an RDD

Resolved! Broadcast Join Failure in Streaming: Failed to store executor broadcast in BlockManager

Resolved! [FREE TRIAL] Missing All-Purpose Clusters Access - New Account

Resolved! Auto Loader vs Batch for Large File Loads

Databricks Jobs Failure Notification to Azure DevOps as incident

Resolved! Broken s3 file paths in File Notifications for auto loader

Modifying size of /var/lib/lxc

Which is better - Azure Databricks or GCP Databricks?

Resolved! Missing warehouse id/metadata for the system compute warehouse table

Resolved! Webinars

Webinar: Geospatial Data Ingestion and Manipulation on Databricks

Resolved! in data bricks free edition Serverless compute is not working

Resolved! Databricks Streaming: Recommended Cluster Types and Best Practices

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template