Data Engineering

Forum Posts

Sorted by:

by r0nald • New Contributor II

06-28-2023 11:58:46 AM

11863 Views
5 replies
1 kudos

UDF not working inside transform() & lambda (SQL)

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...

Data Engineering

11863 Views
5 replies
1 kudos

06-28-2023 11:58:46 AM

View Replies

Latest Reply

marcogrcr
New Contributor II

01-15-2026 6:56:10 AM

1 kudos

Scoped variables in a transform() are not accessible by UDFs. However, you can workaround this using explode():# equivalent of: select transform(arr, e -> status_map(e.v1)) from s1 select collect_list(status_map(status_id)) from explode((select trans...

1 kudos

01-15-2026 6:56:10 AM

4 More Replies

by seefoods • Valued Contributor

01-15-2026 2:16:39 AM

459 Views
3 replies
4 kudos

Resolved! write both logging error Pyspark and Python exceptions

Hello guyz, Happy new year and best wishes for all of us. I am catching both Pyspark and Python exceptions but i want to write this logging error inside a delta table when i logging. Someone knows the best practise for this ? Thanks Cordially,

Data Engineering

459 Views
3 replies
4 kudos

01-15-2026 2:16:39 AM

View Replies

Latest Reply

seefoods
Valued Contributor

01-15-2026 3:13:56 AM

4 kudos

Thanks a lot @szymon_dybczak

4 kudos

01-15-2026 3:13:56 AM

2 More Replies

by Digvijay_11 • Databricks Partner

01-14-2026 11:40:26 PM

544 Views
1 replies
0 kudos

Resolved! Few queries on Autoloader

How to retrieve filename and file path from the trigger and consume in Databricks Notebook dynamicallyIf the same file is being modified with no change in name but in data then will this trigger work? If not what is the walkaround?In landing we are g...

Data Engineering

544 Views
1 replies
0 kudos

01-14-2026 11:40:26 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

01-15-2026 1:19:21 AM

0 kudos

Hi @Digvijay_11 ,1. You can use metadata column for that purpose File metadata column - Azure Databricks | Microsoft Learn2. With the default setting (cloudFiles.allowOverwrites = false), files are processed exactly once. When a file is appended to o...

0 kudos

01-15-2026 1:19:21 AM

by ChristianRRL • Honored Contributor

01-14-2026 8:59:03 AM

449 Views
2 replies
3 kudos

Resolved! Serverless Compute Spark Version Flexibility?

Hi there, I'm wondering what determines the Serverless Compute spark version? Is it based on the current DBR LTS? And is there a way to modify the spark version for serverless compute?For example, when I check the spark version for our serverless com...

Data Engineering

449 Views
2 replies
3 kudos

01-14-2026 8:59:03 AM

View Replies

Latest Reply

Databricks77
New Contributor III

01-14-2026 9:56:05 AM

3 kudos

Serverless compute always run on the latest runtime version. You cannot choose it like in standard compute.

3 kudos

01-14-2026 9:56:05 AM

1 More Replies

by ChristianRRL • Honored Contributor

01-13-2026 7:32:04 PM

1046 Views
4 replies
10 kudos

Resolved! Testing Spark Declarative Pipeline in Docker Container > PySparkRuntimeError

Hi there, I see via an announcement last year that Spark Declarative Pipeline (previously DLT) was getting open sourced into Apache Spark, and I see that this recently is true as of Apache 4.1:Spark Declarative Pipelines Programming Guide I'm trying ...

Data Engineering

1046 Views
4 replies
10 kudos

01-13-2026 7:32:04 PM

View Replies

Latest Reply

aleksandra_ch
Databricks Employee

01-14-2026 8:27:58 AM

10 kudos

Hi @ChristianRRL ,In addition to @osingh 's answers, check out this old but good blog post about how to structure the pipelines's code to enable dev and test cycle: https://www.databricks.com/blog/applying-software-development-devops-best-practices-d...

10 kudos

01-14-2026 8:27:58 AM

3 More Replies

by Anish_2 • New Contributor III

01-13-2026 10:16:22 AM

503 Views
2 replies
0 kudos

Resolved! daabricks workflow design

Hello Team,I have use-case in which i want to trigger another dlt pipeline if 1 table got succeded in my parent dlt pipeline. I dont want to create pipeline to pipeline dependency. Is there any way to create table to pipeline dependency?Thank youAnis...

Data Engineering

deltalivetable

workflowdesign

503 Views
2 replies
0 kudos

01-13-2026 10:16:22 AM

View Replies

Latest Reply

Raman_Unifeye
Honored Contributor III

01-14-2026 1:24:13 AM

0 kudos

@Anish_2 - TUT is the solution. in TUT, instead of the parent pipeline "pushing" a notification, the child job is "pulled" into action by a metadata change.Set it up as below.Create a Databricks Job and add a Pipeline task pointing to your Secondary ...

0 kudos

01-14-2026 1:24:13 AM

1 More Replies

by NotCuriosAtAll • New Contributor III

01-13-2026 6:37:59 AM

577 Views
2 replies
3 kudos

Resolved! Cluster crashes occasionally but not all of the time

We have a small cluster (Standard D2ads v6) with 8 gigs of ram and 2 cores. This is an all-purpose cluster and for some reason, the client demands to use this one for our ETL process. The ETL process is simple, the client drops parquet files in the b...

Data Engineering

577 Views
2 replies
3 kudos

01-13-2026 6:37:59 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

01-13-2026 12:56:48 PM

3 kudos

Hi @NotCuriosAtAll ,You have undersized cluster for your workload. This error is typical on driver node with that high cpu consumption. You can check below article (and related solution):Job run fails with error message “Could not reach driver of clu...

3 kudos

01-13-2026 12:56:48 PM

1 More Replies

by bsr • New Contributor II

12-30-2025 5:58:08 AM

1778 Views
4 replies
5 kudos

Resolved! DBR 17.3.3 introduced unexpected DEBUG logs from ThreadMonitor – how to disable?

After upgrading from DBR 17.3.2 to DBR 17.3.3, we started seeing a flood of DEBUG logs like this in job outputs:```DEBUG:ThreadMonitor:Logging python thread stack frames for MainThread and py4j threads: DEBUG:ThreadMonitor:Logging Thread-8 (run) stac...

Data Engineering

1778 Views
4 replies
5 kudos

12-30-2025 5:58:08 AM

View Replies

Latest Reply

WAHID
New Contributor II

01-13-2026 4:23:50 AM

5 kudos

@iyashk-DBWe are currently using DBR version 17.3 LTS, and the issue is still occurring.Do you know when the fix is expected to be applied? We need this information to decide whether we should wait for the fix or proceed with the workaround you propo...

5 kudos

01-13-2026 4:23:50 AM

3 More Replies

by rijin-thomas • New Contributor II

12-09-2025 7:20:37 PM

644 Views
4 replies
3 kudos

Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

I am on Databricks Run Time LTE 14.3 Spark 3.5.0 Scala 2.12 and mongodb-spark-connector_2.12:10.2.0. Trying to connect to Document DB using the connector and all I get is a connection timeout. I tried using PyMongo, which works as expected and I can ...

Data Engineering

644 Views
4 replies
3 kudos

12-09-2025 7:20:37 PM

View Replies

Latest Reply

Sanjeeb2024
Valued Contributor

01-08-2026 9:06:36 PM

3 kudos

Hi @rijin-thomas - Can you please allow the CIDR block for databricks account VPC from aws document db sg ( Executor connectivity stated by@bianca_unifeye ) .

3 kudos

01-08-2026 9:06:36 PM

3 More Replies

by tvdh • New Contributor II

01-12-2026 6:36:11 AM

251 Views
1 replies
1 kudos

Resolved! Tab navigation between fields in dashboards is random

Tab navigation between fields in published dashboards seem very random.I have a dashboard with multiple text input fields (mapped to query paramters / filters). I expect to move logically between them when pressing tab (keyboard navigation), but I mo...

Data Engineering

251 Views
1 replies
1 kudos

01-12-2026 6:36:11 AM

View Replies

Latest Reply

Advika
Community Manager

01-13-2026 3:32:22 AM

1 kudos

Hello @tvdh! You can share this as product feedback so it’s visible to the Databricks product team and can be tracked and prioritized.

1 kudos

01-13-2026 3:32:22 AM

by Upendra_Dwivedi • Databricks Partner

05-22-2025 4:02:39 AM

3583 Views
2 replies
1 kudos

Resolved! Databricks APP OBO User Authorization

Hi All,We are using on-behalf of user authorization method for our app and the x-forwarded-access-token is expiring after sometime and we have to redeploy our app to rectify the issue. I am not sure what is the issue or how we can keep the token aliv...

Data Engineering

3583 Views
2 replies
1 kudos

05-22-2025 4:02:39 AM

View Replies

Latest Reply

jpt
New Contributor II

01-13-2026 2:20:36 AM

1 kudos

I am confronted with a similar error. I am also using obo user auth and have implemented accessing the token via st.context.headers.get('x-forwarded-access-token') for every query and do not save it in a cache. Still, after 1 hour, i am hit with the...

1 kudos

01-13-2026 2:20:36 AM

1 More Replies

by Ved88 • Databricks Partner

01-07-2026 1:53:34 AM

526 Views
5 replies
1 kudos

databricks all-purpose cluster

getting below error-Failure starting repl. Try detaching and re-attaching the notebook. while executing notebook and can see cluster have all installed lib.

Data Engineering

526 Views
5 replies
1 kudos

01-07-2026 1:53:34 AM

View Replies

Latest Reply

Ved88
Databricks Partner

01-13-2026 12:23:36 AM

1 kudos

Hi,we are not using hive metastore anywhere not sure why that host ((host=consolidated-westeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306))is coming in driver log ,will i need to do whitelist for that .we are having other use case simi...

1 kudos

01-13-2026 12:23:36 AM

4 More Replies

by csondergaardp • New Contributor II

01-09-2026 1:25:16 AM

532 Views
2 replies
2 kudos

Resolved! [PATH_NOT_FOUND] Structured Streaming uses wrong checkpoint location

I'm trying to perform a simple example using structured streaming on a directory created as a Volume. The use case is purely educational; I am investigating various forms of triggers. Basic info:Catalog: "dev_catalog"Schema: "stream"Volume: "streamin...

Data Engineering

532 Views
2 replies
2 kudos

01-09-2026 1:25:16 AM

View Replies

Latest Reply

cgrant
Databricks Employee

01-11-2026 3:03:12 PM

2 kudos

Your checkpoint code looks correct. What is the source of `df`? Is it `/Volumes/dev_catalog/default/streaming_basics/` ? The path looks incorrect - add `stream` to it.

2 kudos

01-11-2026 3:03:12 PM

1 More Replies

by HarishKumarM • New Contributor

01-11-2026 10:13:36 PM

542 Views
1 replies
0 kudos

Resolved! Zerobus Connector Issue

I was trying to implement the example posted on the below link for Zerobus connector to test its functionality on my free edition workspace but unfortunately I am getting below error.Reference Code: https://learn.microsoft.com/en-us/azure/databricks/...

Data Engineering

542 Views
1 replies
0 kudos

01-11-2026 10:13:36 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

01-11-2026 10:28:44 PM

0 kudos

Hey @HarishKumarM , I did some digging and found some helpful information to help you troubleshoot. What the error means Your workspace isn’t currently enrolled in the Zerobus Ingest preview. Even though Zerobus is labeled a Public Preview, it’s st...

0 kudos

01-11-2026 10:28:44 PM

by RevanthV • Contributor

01-09-2026 11:20:49 AM

420 Views
3 replies
3 kudos

Resolved! Data validation with df writes using append mode

Hi Team,Recently i came across a situation where I had to write a huge data and it took 6 hrs to complete...later when i checked the target data , I saw 20% of the total records written incorrectly or corrupted because the source data itself was corr...

Data Engineering

420 Views
3 replies
3 kudos

01-09-2026 11:20:49 AM

View Replies

Latest Reply

RevanthV
Contributor

01-09-2026 11:50:05 PM

3 kudos

Hey @K_Anudeep , thanks a lot for tagging me into the GitHub issue.. This is exactly what I want " validate and commit" feature and i se you have already raised a PR for the same with a new option called . I will try this out and check if it satisfie...

3 kudos

01-09-2026 11:50:05 PM

2 More Replies

Databricks Community

Forum Posts

UDF not working inside transform() & lambda (SQL)

Resolved! write both logging error Pyspark and Python exceptions

Resolved! Few queries on Autoloader

Resolved! Serverless Compute Spark Version Flexibility?

Resolved! Testing Spark Declarative Pipeline in Docker Container > PySparkRuntimeError

Resolved! daabricks workflow design

Resolved! Cluster crashes occasionally but not all of the time

Resolved! DBR 17.3.3 introduced unexpected DEBUG logs from ThreadMonitor – how to disable?

Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

Resolved! Tab navigation between fields in dashboards is random

Resolved! Databricks APP OBO User Authorization

databricks all-purpose cluster

Resolved! [PATH_NOT_FOUND] Structured Streaming uses wrong checkpoint location

Resolved! Zerobus Connector Issue

Resolved! Data validation with df writes using append mode

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template

Use .R file in data pipeline

CVE-2023-51385 and CVE-2023-38408 in Runtime 17.3 ...