Data Engineering

Forum Posts

Sorted by:

by chris_y_1e • New Contributor II

03-06-2025 1:53:01 AM

4900 Views
5 replies
0 kudos

Self-joins are blocked on remote tables

In our production databricks workflow, we have been getting this error since yesterday in one of the steps:org.apache.spark.SparkException: Self-joins are blocked on remote tablesWe haven't changed our workflow or made any configurations for the data...

Data Engineering

4900 Views
5 replies
0 kudos

03-06-2025 1:53:01 AM

View Replies

Latest Reply

chris_y_1e
New Contributor II

03-21-2025 8:28:16 AM

0 kudos

@TomRenish Yeah, we fixed it by changing it to use a shared compute. It is called "USER_ISOLATION" in the `job.yaml` file:data_security_mode: USER_ISOLATION

0 kudos

03-21-2025 8:28:16 AM

4 More Replies

by Upendra_Dwivedi • Databricks Partner

03-19-2025 4:24:06 AM

895 Views
1 replies
0 kudos

Databricks-Sql-Connector

Hi,i am connecting with databricks sql_warehouse using vs_code and i am running following command: import osfrom databricks import sqlhost = 'adb-xxxxxxxxxxx.xx.azuredatabricks.net'http_path = '/sql/1.0/warehouses/xxxxxxxxxxxxxx'access_token = 'dapib...

Data Engineering

895 Views
1 replies
0 kudos

03-19-2025 4:24:06 AM

View Replies

Latest Reply

User16502773013
Databricks Employee

03-21-2025 7:52:01 AM

0 kudos

Hello @Upendra_Dwivedi , This is potentially a missing package in your local Python setup, kindly can you check troubleshooting steps here and let me know In the alternative this didn't work please share the output of the following commands: python ...

0 kudos

03-21-2025 7:52:01 AM

by Abser786 • New Contributor II

03-19-2025 5:06:29 AM

1592 Views
1 replies
0 kudos

enable dynamic resource allocation on job cluster

I have a databricks job having two task those will run each alone or both parallel (will be controlled by if conditional task). When it runs parallel, one task is running for long time, but the same task finish quick when it runs alone. particularly ...

Data Engineering

1592 Views
1 replies
0 kudos

03-19-2025 5:06:29 AM

View Replies

Latest Reply

User16502773013
Databricks Employee

03-21-2025 7:38:55 AM

0 kudos

Hello @Abser786, There is a difference between Dynamic Resource Allocation and the Scheduler policy Dynamic Resource Allocation means getting more compute as needed if current compute is totally consumed, this can be achieved by autoscaling feature/c...

0 kudos

03-21-2025 7:38:55 AM

by filipniziol • Esteemed Contributor

03-11-2025 6:56:40 AM

5570 Views
3 replies
0 kudos

Any known issue with interactive Shared Cluster Driver Memory Cleanup

I am experiencing memory leaks on a Standard (formerly shared) interactive cluster: 1. We run jobs regularly on the cluster2. After each job completes, driver memory usage continues to increase, suggesting resources aren't fully released3. Eventually...

Data Engineering

5570 Views
3 replies
0 kudos

03-11-2025 6:56:40 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

03-21-2025 5:22:10 AM

0 kudos

Hello Team, I'll check internally if any known issue reported.

0 kudos

03-21-2025 5:22:10 AM

2 More Replies

by Vetrivel • Databricks Partner

03-20-2025 9:09:44 PM

1073 Views
1 replies
1 kudos

UC upgrade in Spark Streaming jobs

Kindly share the recommended approach for upgrading from HMS to UC for structured streaming jobs, ensuring seamless execution without any failures or data duplication? I would also appreciate insights into any best practices you have followed during ...

Data Engineering

1073 Views
1 replies
1 kudos

03-20-2025 9:09:44 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-21-2025 4:23:00 AM

1 kudos

Hi Vetrivel,How are you doing today?, As per my understanding, Upgrading from Hive Metastore (HMS) to Unity Catalog (UC) for structured streaming jobs needs a careful approach to avoid failures or data duplication. The best way is to first pause all ...

1 kudos

03-21-2025 4:23:00 AM

by Yutaro • New Contributor III

03-20-2025 8:02:39 PM

813 Views
1 replies
1 kudos

Resolved! How can I efficiently remove backslashes during a COPY INTO load in Databricks?

I’m using Databricks’ COPY INTO to load data from a CSV file into a Delta table. My input CSV looks like this: CSV filecolumn1(string),column2(string) "[\,\,111\,222\,]","012\"34"After running COPY INTO, my Delta table currently contains:column1(str...

Data Engineering

813 Views
1 replies
1 kudos

03-20-2025 8:02:39 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-21-2025 4:12:42 AM

1 kudos

Hi Yutaro,You're doing great, and your question is very clear! In your case, the most efficient way to remove backslashes during the COPY INTO operation is to first load the raw CSV data into a temporary or staging Delta table, and then insert the cl...

1 kudos

03-21-2025 4:12:42 AM

by TimB • New Contributor III

03-20-2025 9:28:11 AM

1204 Views
3 replies
3 kudos

Adding dependencies to Serverless compute with concurrency slows processing right down

I am trying to run a job using the For Each command with many concurrent processes using serverless compute.To add dependencies to serverless jobs, it seems you have to add them to the notebook, rather than configure them on the tasks screen like you...

Data Engineering

1204 Views
3 replies
3 kudos

03-20-2025 9:28:11 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-21-2025 4:08:14 AM

3 kudos

Yeah, TimB. Keep going.

3 kudos

03-21-2025 4:08:14 AM

2 More Replies

by glevin • New Contributor II

03-13-2025 8:12:16 AM

3309 Views
7 replies
1 kudos

JDBC Connection query row limit

Anyone know how to increase the amount of rows returned in a JDBC query? Currently we're receiving 1000 rows per query.Have tried adding a LIMIT 5000 to the end of the query, but no luck.

Data Engineering

3309 Views
7 replies
1 kudos

03-13-2025 8:12:16 AM

View Replies

Latest Reply

glevin
New Contributor II

03-21-2025 2:04:02 AM

1 kudos

Thanks all for your help.Looks like the bottleneck is the tool I'm using the make the connection (Appian). It limits JDBC responses to 1000 rows.

1 kudos

03-21-2025 2:04:02 AM

6 More Replies

by SaeedAsh • New Contributor

03-19-2025 9:02:15 AM

2557 Views
3 replies
0 kudos

How to Permanently Disable Serverless Compute in Azure Databricks?

Hi,I was wondering how to completely disable serverless compute in Azure Databricks. I am certain that it was disabled in my workspace before, but now it seems to be constantly available at the notebook level.Did Databricks release any recent updates...

Data Engineering

2557 Views
3 replies
0 kudos

03-19-2025 9:02:15 AM

View Replies

Latest Reply

ashraf1395
Honored Contributor

03-20-2025 9:17:47 PM

0 kudos

Hey @noorbasha534 , I guess we dont have any feature to enable/disable databricks serverless compute at workspace level. You can confirm this once with your databricks account executive team. They might have a solution for this.

0 kudos

03-20-2025 9:17:47 PM

2 More Replies

by Yutaro • New Contributor III

03-19-2025 3:20:17 AM

3910 Views
5 replies
5 kudos

Resolved! Partitioning vs. Clustering for a 50 TiB Delta Lake Table on Databricks

Hello everyone,I’m planning to create a Delta Lake table on Databricks with an estimated size of ~50 TiB. The table includes three date columns — year, month, and day — and most of my queries will filter on these fields.I’m trying to decide whether t...

Data Engineering

3910 Views
5 replies
5 kudos

03-19-2025 3:20:17 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-20-2025 8:13:09 PM

5 kudos

Hey Yutaro,Thank you so much for the kind words—it honestly means a lot! I'm really glad the guidance helped and that you're feeling more confident moving forward. You're doing all the right things by asking the right questions and planning ahead. If...

5 kudos

03-20-2025 8:13:09 PM

4 More Replies

by Rik • New Contributor III

08-07-2023 7:57:18 AM

11138 Views
13 replies
9 kudos

Resolved! File information is not passed to trigger job on file arrival

We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.Unfortunately, the trigger doesn't actually pass the file-path that is gener...

Data Engineering

file arrival

trigger file

Unity Catalog

11138 Views
13 replies
9 kudos

08-07-2023 7:57:18 AM

View Replies

Latest Reply

Panda
Valued Contributor

03-20-2025 1:05:25 PM

9 kudos

@007 - Review the link https://community.databricks.com/t5/data-engineering/file-arrival-trigger/m-p/94069/highlight/true#M38808

9 kudos

03-20-2025 1:05:25 PM

12 More Replies

by jano • New Contributor III

03-20-2025 10:28:17 AM

2295 Views
1 replies
0 kudos

Resolved! Run failed with termination code: RunExecutionError

I'm getting an error of RunExecutionError with no tasks having run in a notebook. The clusters spin up and then 5 mins later I am getting this error and all cells in the task notebook say cancelled. I don't see any issues with the clusters has they h...

Data Engineering

2295 Views
1 replies
0 kudos

03-20-2025 10:28:17 AM

View Replies

Latest Reply

jano
New Contributor III

03-20-2025 11:41:45 AM

0 kudos

This was due to a %run notebook command where the cluster could not locate the notebook. I was using a relative to root path from the github repo which did work when running the notebook book on a cluster but did not work when I put it into a job. Ho...

0 kudos

03-20-2025 11:41:45 AM

by rriley2 • New Contributor II

03-19-2025 1:45:24 PM

5951 Views
3 replies
0 kudos

Resolved! Asset Bundles Email/Notifications Prod ONly

Howdy, I've got a job 'job1' and my dev/stg/prod target in my databricks.yamlCurrently, I have this configurationfor my job:email_notifications: on_success: - me@myorg.com on_failure: - me@myorg.comwebhook_notifications: on_failure: - id: ${var.w...

Data Engineering

5951 Views
3 replies
0 kudos

03-19-2025 1:45:24 PM

View Replies

Latest Reply

rriley2
New Contributor II

03-19-2025 8:49:33 PM

0 kudos

Hmmm so something like this:targets: dev: resources: jobs: Workflow1: email_notifications: {} webhook_notifications: {} stage: resources: jobs: Workflow1: email_notifications: on_s...

0 kudos

03-19-2025 8:49:33 PM

2 More Replies

by LorenRD • Contributor

11-18-2021 8:15:32 AM

14612 Views
15 replies
11 kudos

Resolved! Is it posible to share a Dashboard with an user inside your org that doesn't have a Databricks account?

Data Engineering

14612 Views
15 replies
11 kudos

11-18-2021 8:15:32 AM

View Replies

Latest Reply

miranda_luna_db
Databricks Employee

03-20-2025 8:22:42 AM

11 kudos

Hi folks - if you're unsure who your account team is and you're interested in the app delegated auth preview, please contact us via aibi-previews [at] databricks [dot] com

11 kudos

03-20-2025 8:22:42 AM

14 More Replies

by GFrost • New Contributor

03-19-2025 3:41:54 AM

1481 Views
1 replies
0 kudos

Passing values from a CTE (Common Table Expression) to user-defined functions (UDF) in Spark SQL

Hello everyone, I'm trying to pass a value from a CTE to my function (UDF). Unfortunately, it's not working.Here is the first variant: WITH fx_date_new AS ( SELECT CASE WHEN '2025-01-01' > current_date() THEN CAST(date_format...

Data Engineering

1481 Views
1 replies
0 kudos

03-19-2025 3:41:54 AM

View Replies

Latest Reply

ggsmith
Contributor

03-20-2025 7:00:28 AM

0 kudos

I think the issue is in your subquery. You shouldn't have the entire cte query in parentheses. Only the column from your CTE. Your FROM clause is inside your udf arguments. See if you can use the example below to fix the issue.CREATE OR REPLACE FUNC...

0 kudos

03-20-2025 7:00:28 AM

Databricks Community

Forum Posts

Self-joins are blocked on remote tables

Databricks-Sql-Connector

enable dynamic resource allocation on job cluster

Any known issue with interactive Shared Cluster Driver Memory Cleanup

UC upgrade in Spark Streaming jobs

Resolved! How can I efficiently remove backslashes during a COPY INTO load in Databricks?

Adding dependencies to Serverless compute with concurrency slows processing right down

JDBC Connection query row limit

How to Permanently Disable Serverless Compute in Azure Databricks?

Resolved! Partitioning vs. Clustering for a 50 TiB Delta Lake Table on Databricks

Resolved! File information is not passed to trigger job on file arrival

Resolved! Run failed with termination code: RunExecutionError

Resolved! Asset Bundles Email/Notifications Prod ONly

Resolved! Is it posible to share a Dashboard with an user inside your org that doesn't have a Databricks account?

Passing values from a CTE (Common Table Expression) to user-defined functions (UDF) in Spark SQL

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template

Resolved! How can I efficiently remove backslashes during a COPY INTO load in Databricks?

Resolved! Partitioning vs. Clustering for a 50 TiB Delta Lake Table on Databricks

Resolved! File information is not passed to trigger job on file arrival

Resolved! Run failed with termination code: RunExecutionError

Resolved! Asset Bundles Email/Notifications Prod ONly

Resolved! Is it posible to share a Dashboard with an user inside your org that doesn't have a Databricks account?

Resolved! Partitioning vs. Clustering for a 50 TiB Delta Lake Table on Databricks