Data Engineering

Forum Posts

Sorted by:

by Saf4Databricks • New Contributor III

yesterday

57 Views
1 replies
0 kudos

Cannot import pyspark.pipelines module

Question: What could be a cause of the following error of my code in a Databricks notebook, and how can we fix the error? I'm using latest Free Edition of Databricks that has runtime version 17.2 and PySpark version 4.0.0.Error:ImportError: cannot im...

Data Engineering

57 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

dkushari
Databricks Employee

13m ago

0 kudos

Hi @Saf4Databricks - Are you trying to use it from a standalone Databricks notebook? You should only use it from with Lakeflow Declarative Pipeline (LDP). The link you shared is about LDP. Here is an example where I used it.

0 kudos

13m ago

by TalessRocha • New Contributor II

08-08-2025 4:28:54 PM

1199 Views
10 replies
8 kudos

Resolved! Connect to azure data lake storage using databricks free edition

Hello guys, i'm using databricks free edition (serverless) and i am trying to connect to a azure data lake storage.The problem I'm having is that in the free edition we can't configure the cluster so I tried to make the connection via notebook using ...

Data Engineering

1199 Views
10 replies
8 kudos

08-08-2025 4:28:54 PM

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

08-17-2025 7:38:54 AM

8 kudos

@TalessRocha thanks for getting back to us! Glad to hear you got it working, that's awesome. Best of luck with your projects.All the best,BS

8 kudos

08-17-2025 7:38:54 AM

9 More Replies

by rajg • New Contributor

6 hours ago

31 Views
0 replies
0 kudos

Cannot export embedded dashboard widget as CSV or other formats except PNG

I’ve integrated a Databricks dashboard into my web application for all my users, following the guidelines in this article:Embedding Databricks Dashboards.This integration worked perfectly initially. However, I’m now encountering an issue with exporti...

Data Engineering

31 Views
0 replies
0 kudos

6 hours ago

by maninegi05 • New Contributor

yesterday

59 Views
2 replies
0 kudos

DLT Pipeline Stopped working

Hello, Suddenly our DLT pipelines we're getting failures saying thatLookupError: Traceback (most recent call last): result_df = result_df.withColumn("input_file_path", col("_metadata.file_path")).withColumn( ...

Data Engineering

59 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Khaja_Zaffer
Contributor III

yesterday

0 kudos

May be there is internally some updates from databricks Can Check and Switch Your Pipeline Channel, In the DLT pipeline settings (under Advanced > Channel), confirm if it's set to "Preview". Switch to "Current" for a more stable engine version, then...

0 kudos

yesterday

1 More Replies

by Malthe • Contributor II

yesterday

196 Views
4 replies
0 kudos

Resolved! Can't enable "variantType-preview" using DLTs

Using create_streaming_table and passing table properties as follows, I get an error running the pipeline for the first time:> Your table schema requires manually enablement of the following table feature(s): variantType-preview.I'm using this code:c...

Data Engineering

196 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

Malthe
Contributor II

yesterday

0 kudos

There's a workaround available in most situations which is to first create the table without the VARIANT column, run the pipeline at least once, and then add the column in a subsequent refresh.

0 kudos

yesterday

3 More Replies

by raghvendrarm1 • New Contributor

yesterday

83 Views
0 replies
0 kudos

Results from the spark application to driver

I tried to read many articles but still not clear on this:The executors complete the execution of tasks and have the results with them.1. The results(output data) from all executors is transported to driver in all cases or executors persist it if tha...

Data Engineering

83 Views
0 replies
0 kudos

yesterday

by Upendra_Dwivedi • Contributor

05-22-2025 4:02:39 AM

2450 Views
1 replies
1 kudos

Databricks APP OBO User Authorization

Hi All,We are using on-behalf of user authorization method for our app and the x-forwarded-access-token is expiring after sometime and we have to redeploy our app to rectify the issue. I am not sure what is the issue or how we can keep the token aliv...

Data Engineering

2450 Views
1 replies
1 kudos

05-22-2025 4:02:39 AM

View Replies

Latest Reply

jamesl
Databricks Employee

yesterday

1 kudos

Hi @Upendra_Dwivedi , are you still facing this issue? The x-forwarded-access-token your app receives is the current user’s access token that Databricks forwards in HTTP headers for on‑behalf‑of‑user access. You should read it from the request on eac...

1 kudos

yesterday

by Mous92i • New Contributor

Wednesday

174 Views
3 replies
2 kudos

Resolved! Liquid Clustering With Merge

Hello I’m facing severe performance issues with a merge into databricksmerge_condition = """ source.data_hierarchy = target.data_hierarchy AND source.sensor_id = target.sensor_id AND source.timestamp = target.timestamp """The target Delt...

Data Engineering

174 Views
3 replies
2 kudos

Wednesday

View Replies

Latest Reply

Mous92i
New Contributor

yesterday

2 kudos

Thanks for your response

2 kudos

yesterday

2 More Replies

by databricksero • New Contributor

Wednesday

322 Views
8 replies
3 kudos

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Hi everyone,I’m running into an issue with a Delta Live Tables (DLT) pipeline that processes a few transformation layers (raw → intermediate → primary → feature).When I trigger the entire pipeline, it fails with the following error:can not infer sche...

Data Engineering

322 Views
8 replies
3 kudos

Wednesday

View Replies

Latest Reply

ManojkMohan
Honored Contributor

Thursday

3 kudos

@databricksero Explicit Schema Definition: When calling spark.createDataFrame(pdf_cleaned), explicitly provide the schema even if the DataFrame is empty. This helps Spark infer the types and prevents the “cannot infer schema from empty dataset” erro...

3 kudos

Thursday

7 More Replies

by deng_dev • New Contributor III

11-27-2023 1:44:03 AM

11255 Views
1 replies
0 kudos

py4j.protocol.Py4JJavaError: An error occurred while calling o359.sql. : java.util.NoSuchElementExce

Hi!We are creating table in streaming job every micro-batch using spark.sql('create or replace table ... using delta as ...') command. This query includes combining data from multiple tables.Sometimes our job fails with error:py4j.Py4JException: An e...

Data Engineering

11255 Views
1 replies
0 kudos

11-27-2023 1:44:03 AM

View Replies

Latest Reply

sahilchavan
New Contributor II

yesterday

0 kudos

Hi @deng_dev ,Did you discover any way to raise this error gracefully? I'm facing the same error when running the kinesis stream. Although I'm aware of what the error is but my intent is to raise and log the error gracefully

0 kudos

yesterday

by Brahmareddy • Esteemed Contributor

a week ago

137 Views
1 replies
4 kudos

I Tried Teaching Databricks About Itself — Here’s What Happened

Hi All, How are you doing today?I wanted to share something interesting from my recent Databricks work — I’ve been playing around with an idea I call “Real-Time Metadata Intelligence.” Most of us focus on optimizing data pipelines, query performance,...

Data Engineering

137 Views
1 replies
4 kudos

a week ago

View Replies

Latest Reply

ruicarvalho_de
New Contributor II

yesterday

4 kudos

I like the core idea. You are mining signals the platform already emits.I would start rules first, track small files ratio and average file size trend, watch skew per partition and shuffle bytes per input gigabyte. Compare job time to input size to c...

4 kudos

yesterday

by Bhavana_Y • New Contributor

Thursday

72 Views
1 replies
1 kudos

Learning Path for Spark Developer Associate

Hello Everyone,Happy for being a part of Virtual Journey !!Enrolled in Associate Spark Developer and completed learning path in Databricks Academy. Can anyone please confirm is completing learning path enough for obtaining 50% off voucher for certifi...

Data Engineering

72 Views
1 replies
1 kudos

Thursday

View Replies

Latest Reply

Advika
Databricks Employee

yesterday

1 kudos

Hello @Bhavana_Y! To be eligible for the incentives, you’ll need to complete one of the pathways mentioned in the Learning Festival post. Based on your screenshot, it looks like you’ve completed all four modules of LEARNING PATHWAY 7: APACHE SPARK DE...

1 kudos

yesterday

by donlxz • New Contributor III

Wednesday

176 Views
4 replies
3 kudos

Resolved! deadlock occurs with use statement

When issuing a query from Informatica using a Delta connection, the statement use catalog_name.schema_name is executed first. At that time, the following error appeared in the query history:Query could not be scheduled: (conn=5073499)Deadlock found w...

Data Engineering

176 Views
4 replies
3 kudos

Wednesday

View Replies

Latest Reply

donlxz
New Contributor III

yesterday

3 kudos

I’ll try making adjustments on the Informatica side.Thank you for your help.

3 kudos

yesterday

3 More Replies

by Jonathan_ • New Contributor II

Tuesday

192 Views
4 replies
6 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

Data Engineering

192 Views
4 replies
6 kudos

Tuesday

View Replies

Latest Reply

tarunnagar
New Contributor

yesterday

6 kudos

This is a pretty common issue with PySpark when working on large DAGs with lots of joins and transformations. As the DAG grows, Spark has to maintain a huge execution plan, and performance can drop due to shuffling, serialization, and memory overhead...

6 kudos

yesterday

3 More Replies

by mikvaar • New Contributor III

09-16-2025 4:11:32 AM

646 Views
8 replies
6 kudos

Resolved! DAB + DLT destroy fails due to ownership/permissions mismatch

Hi all,We are running into an issue with Databricks Asset Bundles (DAB) when trying to destroy a DLT pipeline. Setup is as follows:Two separate service principals:Deployment SP: used by Azure DevOps for deploying bundles.Run_as SP: used for running t...

Data Engineering

Databricks

Databricks Asset Bundles

DevOps

646 Views
8 replies
6 kudos

09-16-2025 4:11:32 AM

View Replies

Latest Reply

denis-dbx
Databricks Employee

yesterday

6 kudos

We just released https://github.com/databricks/cli/releases/tag/v0.273.0 with a mitigation for this, the error should disappear if you upgrade. Please try and let us know how it goes. Terraform fix is in https://github.com/databricks/terraform-provid...

6 kudos

yesterday

7 More Replies

Databricks Community

Forum Posts

Cannot import pyspark.pipelines module

Resolved! Connect to azure data lake storage using databricks free edition

Cannot export embedded dashboard widget as CSV or other formats except PNG

DLT Pipeline Stopped working

Resolved! Can't enable "variantType-preview" using DLTs

Results from the spark application to driver

Databricks APP OBO User Authorization

Resolved! Liquid Clustering With Merge

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

py4j.protocol.Py4JJavaError: An error occurred while calling o359.sql. : java.util.NoSuchElementExce

I Tried Teaching Databricks About Itself — Here’s What Happened

Learning Path for Spark Developer Associate

Resolved! deadlock occurs with use statement

Slow PySpark operations after long DAG that contains many joins and transformations

Resolved! DAB + DLT destroy fails due to ownership/permissions mismatch

Join Us as a Local Community Builder!

DAB + DLT destroy fails due to ownership/permissio...

Can't enable "variantType-preview" using DLTs

Liquid Clustering With Merge

deadlock occurs with use statement

is there another way to authen to azure databricks...