Data Engineering

Forum Posts

Sorted by:

by jordanpinder • New Contributor

02-28-2025 6:51:13 AM

4838 Views
1 replies
0 kudos

Native geometry Parquet support

Hi there!With the recent GeoParquet 2.0 announcements, I'm curious to understand how this impacts storing geospatial data in Databricks and Delta. For reference:the Parquet specification officially adopting geospatial guidance allowing native storage...

Data Engineering

4838 Views
1 replies
0 kudos

02-28-2025 6:51:13 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:43:03 AM

0 kudos

GeoParquet 2.0’s formalization within the Apache Parquet specification is a significant step for native geospatial data storage across the modern data ecosystem, particularly for platforms like Databricks and Delta Lake. In summary, Delta Lake's reli...

0 kudos

11-05-2025 4:43:03 AM

by Dave_Nithio • Contributor II

03-18-2025 1:27:14 PM

3799 Views
1 replies
0 kudos

Preset Partner Connect Schema Changes

When using partner connect to connect Serverless Databricks to my BI tool Preset, you must manually define the schema that Preset has access to. In my case, I individually selected all databases currently in my hive_metastore:The problem is, once cre...

Data Engineering

3799 Views
1 replies
0 kudos

03-18-2025 1:27:14 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:41:42 AM

0 kudos

No, there is currently no simple, direct way to add new schema access to an existing Serverless Databricks SQL warehouse connection through Partner Connect for Preset—neither through Databricks UI, BI tool configuration, nor the Databricks service pr...

0 kudos

11-05-2025 4:41:42 AM

by fscaravelli • New Contributor

03-13-2025 11:18:57 AM

4346 Views
1 replies
0 kudos

Ingest files from GCS with Auto Loader in DLT pipeline running on AWS

I have some DLT pipelines working fine ingesting files from S3. Now I'm trying to build a pipeline to ingest files from GCS using Auto Loader. I'm running Databricks on AWS.The code I have:import dlt import json from pyspark.sql.functions import col ...

Data Engineering

4346 Views
1 replies
0 kudos

03-13-2025 11:18:57 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:40:27 AM

0 kudos

Your error is due to how Databricks on AWS is trying to access GCS: it's defaulting to using the GCP metadata server (which only exists on Google Cloud VMs), not the service account key you provided. This is a common issue when connecting GCS from no...

0 kudos

11-05-2025 4:40:27 AM

by kulasangar • New Contributor II

03-18-2025 11:20:19 AM

3999 Views
1 replies
0 kudos

Permission Denied while trying to update a yaml file within a python project in Databricks

I have a python project and within that I do have a yaml file. Currently i'm building the project using poetry and creating an asset bundle to deploy it in Databricks as a workflow job.So when the workflow runs, I do have an __init__.py within my ent...

Data Engineering

3999 Views
1 replies
0 kudos

03-18-2025 11:20:19 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:38:57 AM

0 kudos

The main issue is that Databricks jobs typically run in environments where the file system may be read-only or restricted—especially for files packaged within the asset bundle or inside locations like /databricks/driver, /databricks/conda, or other s...

0 kudos

11-05-2025 4:38:57 AM

by antoniomf • New Contributor

03-14-2025 6:06:56 AM

4147 Views
1 replies
0 kudos

Bug Delta Live Tables - Checkpoint

Hello, I've encountered an issue with Delta Live Table in both my Development and Production Workspaces. The data is arriving correctly in my Azure Storage Account; however, the checkpoint is being stored in the path dbfs:/. I haven't modified the St...

Data Engineering

4147 Views
1 replies
0 kudos

03-14-2025 6:06:56 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:37:11 AM

0 kudos

There appears to be a recurring issue with Delta Live Table (DLT) pipelines in Databricks where the checkpoint is unexpectedly stored in the dbfs:/ path, rather than in the intended external storage location (such as Azure Blob Storage or ADLS). This...

0 kudos

11-05-2025 4:37:11 AM

by max_eg • New Contributor II

11-05-2025 2:06:46 AM

1309 Views
1 replies
1 kudos

Resolved! Bug in Asset Bundle Sync

I think I found a bug in the way asset bundles sync/ deploy, or at least I have a question if I understood it correctly. My Setup:I have an asset bundle, consisting of a notebook nb1.py and a utils module utils.py.nb1.py imports functions from utils....

Data Engineering

1309 Views
1 replies
1 kudos

11-05-2025 2:06:46 AM

View Replies

Latest Reply

bianca_unifeye
Databricks MVP

11-05-2025 2:56:20 AM

1 kudos

Hi @max_egWhat you’re seeing is expected with Asset Bundles.databricks bundle deploy computes what changed locally and only uploads those files. If you edited nb1.py in the workspace (not locally), the deploy won’t “see” a local delta for that file, ...

1 kudos

11-05-2025 2:56:20 AM

by sergecom • New Contributor III

10-28-2025 1:57:33 PM

1099 Views
2 replies
1 kudos

Resolved! Migrating from on-premises HDFS to Unity Catalog - Looking for advice on on-prem options

Hi,We’re currently running a Databricks installation with an on-premises HDFS file system. As we’re looking to adopt Unity Catalog, we’ve realized that our current HDFS setup has limited support and compatibility with Unity Catalog.Our requirement: W...

Data Engineering

1099 Views
2 replies
1 kudos

10-28-2025 1:57:33 PM

View Replies

Latest Reply

sergecom
New Contributor III

11-05-2025 1:01:15 AM

1 kudos

Thanks very much for your detailed response — this is really helpful.You mentioned client cases where organizations have migrated from on-premises HDFS into the Databricks Unity Catalog, I’d love to learn more about those.If possible, could you share...

1 kudos

11-05-2025 1:01:15 AM

1 More Replies

by Danish11052000 • Contributor

11-04-2025 9:56:03 AM

1361 Views
2 replies
0 kudos

Looking for Advice: Robust Backup Strategy for Databricks System Tables

HI,I’m planning to build a backup system for all Databricks system tables (audit, usage, price, history, etc.) to preserve data beyond retention limits. Currently, I’m using Spark Streaming with readStream + writeStream and checkpointing in LakeFlow ...

Data Engineering

1361 Views
2 replies
0 kudos

11-04-2025 9:56:03 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:09:01 AM

0 kudos

Greetings @Danish11052000 , here’s a pragmatic way to choose, based on the nature of Databricks system tables and the guarantees you want. Bottom line For ongoing replication to preserve data beyond free retention, a Lakeflow Declarative Pipeline w...

0 kudos

11-04-2025 10:09:01 AM

1 More Replies

by aiohi • Databricks Partner

11-04-2025 1:32:52 PM

319 Views
1 replies
0 kudos

Resolved! Claude Access to Workspace and Catalog

I have a question, if we have a Claude corporate account are we able to link that directly to the Playground of Databricks? So that we would not have to add files separately that are already available in our workspace or catalog.

Data Engineering

319 Views
1 replies
0 kudos

11-04-2025 1:32:52 PM

View Replies

Latest Reply

MuthuLakshmi
Databricks Employee

11-04-2025 8:39:43 PM

0 kudos

@aiohi yes you should be able to access the files available. https://www.databricks.com/blog/anthropic-claude-37-sonnet-now-natively-available-databricks https://support.claude.com/en/articles/12430928-using-databricks-for-data-analysis Docs for your...

0 kudos

11-04-2025 8:39:43 PM

by Mathias_Peters • Contributor II

11-03-2025 6:56:31 AM

393 Views
1 replies
1 kudos

Resolved! Reading MongoDB collections into an RDD

Hi, for a Spark job which does some custom computation, I need to access data from a MongoDB collection and access the elements as of type Document. The reason for this is, that I want to apply some custom type serialization which is already implemen...

Data Engineering

393 Views
1 replies
1 kudos

11-03-2025 6:56:31 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:33:19 AM

1 kudos

Greeting @Mathias_Peters , here are some suggestions for your consideration. Analysis You're encountering a common challenge when migrating to newer versions of the MongoDB Spark Connector. The architecture changed significantly between versions 2.x ...

1 kudos

11-04-2025 10:33:19 AM

by pooja_bhumandla • Databricks Partner

11-04-2025 6:10:33 AM

1144 Views
1 replies
1 kudos

Resolved! Broadcast Join Failure in Streaming: Failed to store executor broadcast in BlockManager

Hi Databricks Community,I’m running a Structured Streaming job in Databricks with foreachBatch writing to a Delta table.Failed to store executor broadcast spark_join_relation_1622863(size = Some(67141632)) in BlockManager with storageLevel=StorageLev...

Data Engineering

1144 Views
1 replies
1 kudos

11-04-2025 6:10:33 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:22:22 AM

1 kudos

Greetings @pooja_bhumandla , here are some helpful hints and tips. Diagnosis Your error indicates that a broadcast join operation is attempting to send ~64MB of data to executors, but the BlockManager cannot store it due to memory constraints. This c...

1 kudos

11-04-2025 10:22:22 AM

by pabloratache • New Contributor III

11-03-2025 4:18:23 PM

833 Views
4 replies
5 kudos

Resolved! [FREE TRIAL] Missing All-Purpose Clusters Access - New Account

Issue Description: I created a new Databricks Free Trial account ("For Work" plan with $400 credits) but I don't have access to All-Purpose Clusters or PySpark compute. My workspace only shows SQL-only features.Current Setup:- Account Email: ronel.ra...

Data Engineering

833 Views
4 replies
5 kudos

11-03-2025 4:18:23 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11-04-2025 10:02:33 AM

5 kudos

Ah, got it @pabloratache , I did some digging and here is what I found (learned a few things myself). Thanks for the detailed context — this behavior is expected for the current Databricks 14‑day Free Trial (“For Work” plan). What’s happening with ...

5 kudos

11-04-2025 10:02:33 AM

3 More Replies

by SahiSammu • New Contributor II

11-04-2025 4:24:05 AM

1396 Views
2 replies
0 kudos

Resolved! Auto Loader vs Batch for Large File Loads

Hi everyone,I'm seeing a dramatic difference in processing times between batch and streaming (Auto Loader) approaches for reading about 250,000 files from S3 in Databricks. My goal is to read metadata from these files and register it as a table (even...

Data Engineering

autoloader

Directory Listing

ingestion

1396 Views
2 replies
0 kudos

11-04-2025 4:24:05 AM

View Replies

Latest Reply

SahiSammu
New Contributor II

11-04-2025 10:04:41 AM

0 kudos

Thank you, Anudeep.I plan to tune Auto Loader by increasing the maxFilesPerTrigger parameter to optimize performance. My decision to use Auto Loader is primarily driven by its built-in backup functionality via cloudFiles.cleanSource.moveDestination, ...

0 kudos

11-04-2025 10:04:41 AM

1 More Replies

by noorbasha534 • Valued Contributor II

03-09-2025 4:36:59 PM

4017 Views
1 replies
0 kudos

Databricks Jobs Failure Notification to Azure DevOps as incident

Dear all,Has anyone tried sending Databricks Jobs Failure Notification to Azure DevOps as incident? I see webhook as a OOTB destination for jobs. I am thinking to leverage it. But, like to hear any success stories of it or any other smart approaches....

Data Engineering

4017 Views
1 replies
0 kudos

03-09-2025 4:36:59 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-04-2025 10:00:25 AM

0 kudos

Yes, there are successful approaches and best practices for sending Databricks Job Failure notifications to Azure DevOps as incidents, primarily by leveraging the webhook feature as an out-of-the-box (OOTB) destination in Databricks Jobs. The workflo...

0 kudos

11-04-2025 10:00:25 AM

by aonurdemir • Contributor

10-27-2025 2:37:18 AM

749 Views
3 replies
5 kudos

Resolved! Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

Data Engineering

749 Views
3 replies
5 kudos

10-27-2025 2:37:18 AM

View Replies

Latest Reply

K_Anudeep
Databricks Employee

10-29-2025 9:56:41 AM

5 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

5 kudos

10-29-2025 9:56:41 AM

2 More Replies

Databricks Community

Forum Posts

Native geometry Parquet support

Preset Partner Connect Schema Changes

Ingest files from GCS with Auto Loader in DLT pipeline running on AWS

Permission Denied while trying to update a yaml file within a python project in Databricks

Bug Delta Live Tables - Checkpoint

Resolved! Bug in Asset Bundle Sync

Resolved! Migrating from on-premises HDFS to Unity Catalog - Looking for advice on on-prem options

Looking for Advice: Robust Backup Strategy for Databricks System Tables

Resolved! Claude Access to Workspace and Catalog

Resolved! Reading MongoDB collections into an RDD

Resolved! Broadcast Join Failure in Streaming: Failed to store executor broadcast in BlockManager

Resolved! [FREE TRIAL] Missing All-Purpose Clusters Access - New Account

Resolved! Auto Loader vs Batch for Large File Loads

Databricks Jobs Failure Notification to Azure DevOps as incident

Resolved! Broken s3 file paths in File Notifications for auto loader

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...