Data Engineering

Forum Posts

Sorted by:

by MingOnCloud • New Contributor II

05-12-2025 12:34:59 AM

2675 Views
1 replies
0 kudos

Schema Evolution with "schemaTrackingLocation" fails anyway

Hi, I'm trying to understand the usage of "schemaTrackLocation" with schema evolution.I use these articles as references:https://docs.delta.io/latest/delta-streaming.html#tracking-non-additive-schema-changeshttps://docs.databricks.com/aws/en/error-me...

Data Engineering

2675 Views
1 replies
0 kudos

05-12-2025 12:34:59 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

17 seconds ago

0 kudos

Here are answers to your detailed questions about using schemaTrackingLocation for dropping columns in Delta Lake streaming, based on your references and operational experience. Question 1: schemaTrackingLocation Path Requirements Yes, it is normal...

0 kudos

17 seconds ago

by soumiknow • Contributor II

05-18-2025 8:29:01 PM

2493 Views
1 replies
0 kudos

data not inserting in 'overwrite' mode - Value has type STRUCT which cannot be inserted into column

We have the following code which we used to load data to BigQuery table after reading the parquet files from Azure Data Lake Storage:df.write.format("bigquery").option( "parentProject", gcp_project_id ).option("table", f"{bq_table_name}").option( "te...

Data Engineering

2493 Views
1 replies
0 kudos

05-18-2025 8:29:01 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

2m ago

0 kudos

The issue you are facing arises when using mode("overwrite") with Spark to load data into BigQuery—the error indicates BigQuery expects a STRING type for the source column, but it is being supplied a STRUCT type during overwrite operations. Strangely...

0 kudos

2m ago

by jeremy98 • Honored Contributor

05-26-2025 1:02:38 AM

2122 Views
1 replies
0 kudos

How to Initialize Sentry in All Notebooks Used in Jobs using init.py?

Hi Community,I'm looking to initialize Sentry in all notebooks that are used across multiple jobs. My goal is to capture exceptions using Sentry whenever a job runs a notebook.What’s the recommended approach for initializing Sentry packages in this c...

Data Engineering

2122 Views
1 replies
0 kudos

05-26-2025 1:02:38 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

16m ago

0 kudos

To consistently initialize Sentry in all notebooks for reliable exception tracking, experts recommend using a shared initialization approach that minimizes duplication and ensures setup for every job execution. Here’s a structured approach: Recommend...

0 kudos

16m ago

by DataP1 • New Contributor

06-01-2025 3:42:13 PM

2635 Views
3 replies
0 kudos

Excel File from Databricks Not Auto-Adjusting Columns in Power Automate Email Attachment

Hi community,I've built an automation workflow using Databricks and Power Automate. The process runs a query in Databricks, exports the result to Excel, auto-adjusts the columns based on the header/content, and then Power Automate picks up the file a...

Data Engineering

2635 Views
3 replies
0 kudos

06-01-2025 3:42:13 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

18m ago

0 kudos

Yes, this is a common challenge when automating Excel file generation—the default export (especially from pandas or Databricks) does not auto-fit column widths, resulting in cramped columns when viewed or emailed. Auto-fitting columns typically requi...

0 kudos

18m ago

2 More Replies

by tt_921 • Visitor

18m ago

3 Views
0 replies
0 kudos

Databricks CLI binding storage credential to a workspace

In the documentation from Databricks it says to run the below for binding a storage credential to a workspace (after already completing step 1 to update the `isolation-mode` to be `ISOLATED`): databricks workspace-bindings update-bindings storage-cre...

Data Engineering

3 Views
0 replies
0 kudos

18m ago

by sanutopia • New Contributor

06-05-2025 2:01:38 AM

2099 Views
1 replies
0 kudos

How to ingest data from SAP Data Services (ECC, IP, MDG, FLP, MRP) to Databricks Lakehouse on GCP ?

Hi Friends,My customer is using Databricks (as GCP partner product). The ask is to ingest data from sources into Databricks Lakehouse. Currently customer has 3 types of sources : SAP (ECC, Hana) , Oracle and Kafka StreamWhat are the Databricks native...

Data Engineering

2099 Views
1 replies
0 kudos

06-05-2025 2:01:38 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

19m ago

0 kudos

Databricks on GCP offers several native ETL services and integration options to ingest data from SAP (ECC, HANA), Oracle, and Kafka Streams into the Lakehouse. Comparing Databricks-native solutions with GCP-native ETL like Data Fusion or Dataflow rev...

0 kudos

19m ago

by LeoGriffM • New Contributor II

06-10-2025 7:51:38 AM

2355 Views
2 replies
0 kudos

Zip archive with PowerShell "Error: The zip file may not be valid or may be an unsupported version."

Zip archive "Error: The zip file may not be valid or may be an unsupported version."We are trying to upload a ZIP archive to a Databricks workspace for faster and atomic uploads of artifacts. The expected behaviour is that we can run the following co...

Data Engineering

2355 Views
2 replies
0 kudos

06-10-2025 7:51:38 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

20m ago

0 kudos

The error message "Error: The zip file may not be valid or may be an unsupported version" when importing a zip archive via the Databricks CLI is a known issue, especially with zip files created using PowerShell's Compress-Archive or [System.IO.Compre...

0 kudos

20m ago

1 More Replies

by sandy311 • New Contributor III

06-13-2025 12:05:22 PM

2291 Views
3 replies
1 kudos

Install python packages on serverless compute in DLT pipelines (using asset bundles)

Has anyone figured out how to install packages on serverless compute using asset bundle,s similar to how we handle it for jobs or job tasks?I didn’t see any direct option for this, apart from installing packages manually within a notebook.I tried ins...

Data Engineering

DLT Serverless

2291 Views
3 replies
1 kudos

06-13-2025 12:05:22 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

21m ago

1 kudos

Installing Python packages on Databricks serverless compute via asset bundles is possible, but there are some unique limitations and required configuration adjustments compared to traditional jobs or job tasks. The core methods to install packages fo...

1 kudos

21m ago

2 More Replies

by saicharandeepb • New Contributor III

06-16-2025 11:51:35 PM

1898 Views
1 replies
0 kudos

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

Hi everyone,I'm planning to implement Azure Databricks Auto Loader using the Databricks-managed file notification mode for an external location registered in Unity Catalog. I understand this feature is currently in public preview, and I’d love to hea...

Data Engineering

1898 Views
1 replies
0 kudos

06-16-2025 11:51:35 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

22m ago

0 kudos

Yes, Azure Databricks Auto Loader with Databricks-managed file notification mode for external locations in Unity Catalog has been successfully implemented by users, especially since it entered public preview in 2025, and it's designed to make file di...

0 kudos

22m ago

by tbailey • New Contributor II

06-16-2025 9:18:53 AM

2164 Views
3 replies
1 kudos

DABs, policies and cluster pools

My scenario,A policy called 'Job Pool', which has the following overrides:"instance_pool_id": { "type": "unlimited", "hidden": true }, "driver_instance_pool_id": { "type": "unlimited", "hidden": true }I have an asset bundle that sets a new cluster as...

Data Engineering

2164 Views
3 replies
1 kudos

06-16-2025 9:18:53 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

23m ago

1 kudos

You are experiencing validation errors assigning a driver to an on-demand pool and workers to a spot pool in your Databricks Asset Bundle (DAB) configuration because the 'spot_bid_max_price' attribute is being forced by policies—even when the pools a...

1 kudos

23m ago

2 More Replies

by pvalcheva • New Contributor

06-24-2025 2:06:03 AM

1684 Views
1 replies
0 kudos

Simba Spark Driver fails for big datasets in Excel

Hello, I am getting the following error when I want to extract data from Databricks via VBA code. The code for the connection is:Option ExplicitConst adStateClosed = 0Public CnAdo As New ADODB.ConnectionDim DSN_name As StringDim WB As WorkbookDim das...

Data Engineering

1684 Views
1 replies
0 kudos

06-24-2025 2:06:03 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

25m ago

0 kudos

The code you provided for connecting to Databricks via VBA appears structurally sound, but the cause of the error you are experiencing could stem from several typical issues encountered when using ADODB with Databricks ODBC connections from Excel VBA...

0 kudos

25m ago

by Gustavo_Az • Contributor

06-24-2025 2:02:40 AM

1745 Views
2 replies
0 kudos

Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE

HelloIm optmizing a big notebook and have encountered many times the tip from databricks that says "Unused range join hints". Reading the documentation for reference, I have been able to supress that warning in almost all cells, but some of then rema...

Data Engineering

1745 Views
2 replies
0 kudos

06-24-2025 2:02:40 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

27m ago

0 kudos

There is no official documentation covering the use of range_join hints directly with the INSERT INTO ... REPLACE WHERE operation in Databricks—existing documentation around range joins focuses only on explicit joining operations, not on conditional ...

0 kudos

27m ago

1 More Replies

by ChrisLawford_n1 • Contributor

07-15-2025 3:09:43 AM

1898 Views
1 replies
1 kudos

Update for databricks-dlt pip package

Hello, With the recent changes to Delta Live Tables, I was wondering when the python stub will be updated to reflect the new methods that are available ?Link to the Pypi repo:databricks-dlt·PyPI

Data Engineering

1898 Views
1 replies
1 kudos

07-15-2025 3:09:43 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

29m ago

1 kudos

The Python stub for Delta Live Tables (DLT), which helps with local development by providing API specs, docstring references, and type hints, is available as the databricks-dlt package on PyPI. However, this library only provides interfaces to the DL...

1 kudos

29m ago

by 1GauravS • New Contributor III

Friday

111 Views
1 replies
0 kudos

Ingesting Data from Event Hubs via Kafka API with Serverless Compute

Hi!I'm currently working on ingesting log data from Azure Event Hubs into Databricks. Initially, I was using a managed Databricks workspace, which couldn't access Event Hubs over a private endpoint. To resolve this, our DevOps team provisioned a VNet...

Data Engineering

111 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

mark_ott
Databricks Employee

32m ago

0 kudos

Serverless compute in Azure Databricks does not support accessing resources over private endpoints, such as Azure Event Hubs configured with a private endpoint. This is a known and frequently cited limitation in the Databricks documentation and commu...

0 kudos

32m ago

by ChrisLawford_n1 • Contributor

Monday

48 Views
1 replies
0 kudos

Network error on subsequent runs using serverless compute in DLT

Hello,When running on a serverless cluster in DLT our notebook first tries to install some python whls onto the cluster. We have noticed that when in development and running a pipeline many times over in a short space of time between runs that the pi...

Data Engineering

48 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

mark_ott
Databricks Employee

34m ago

0 kudos

The error you’re seeing (“Network is unreachable” repeated during pip installs) on a DLT (Delta Live Table) serverless cluster, especially after the first successful run, is a common issue that appears to affect Databricks pipelines run repeatedly on...

0 kudos

34m ago

Databricks Community

Forum Posts

Schema Evolution with "schemaTrackingLocation" fails anyway

data not inserting in 'overwrite' mode - Value has type STRUCT which cannot be inserted into column

How to Initialize Sentry in All Notebooks Used in Jobs using init.py?

Excel File from Databricks Not Auto-Adjusting Columns in Power Automate Email Attachment

Databricks CLI binding storage credential to a workspace

How to ingest data from SAP Data Services (ECC, IP, MDG, FLP, MRP) to Databricks Lakehouse on GCP ?

Zip archive with PowerShell "Error: The zip file may not be valid or may be an unsupported version."

Install python packages on serverless compute in DLT pipelines (using asset bundles)

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

DABs, policies and cluster pools

Simba Spark Driver fails for big datasets in Excel

Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE

Update for databricks-dlt pip package

Ingesting Data from Event Hubs via Kafka API with Serverless Compute

Network error on subsequent runs using serverless compute in DLT

Join Us as a Local Community Builder!

Unable to login to community edition

Learning Path for Spark Developer Associate

DLT Pipeline Stopped working

Migrating Talend ETL Jobs to Databricks – Best Pra...

Conversational Agent App integration with genie in...