cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

chris_y_1e
by New Contributor II
  • 4900 Views
  • 5 replies
  • 0 kudos

Self-joins are blocked on remote tables

In our production databricks workflow, we have been getting this error since yesterday in one of the steps:org.apache.spark.SparkException: Self-joins are blocked on remote tablesWe haven't changed our workflow or made any configurations for the data...

  • 4900 Views
  • 5 replies
  • 0 kudos
Latest Reply
chris_y_1e
New Contributor II
  • 0 kudos

@TomRenish Yeah, we fixed it by changing it to use a shared compute. It is called "USER_ISOLATION" in the `job.yaml` file:data_security_mode: USER_ISOLATION

  • 0 kudos
4 More Replies
Upendra_Dwivedi
by Databricks Partner
  • 895 Views
  • 1 replies
  • 0 kudos

Databricks-Sql-Connector

Hi,i am connecting with databricks sql_warehouse using vs_code and i am running following command: import osfrom databricks import sqlhost = 'adb-xxxxxxxxxxx.xx.azuredatabricks.net'http_path = '/sql/1.0/warehouses/xxxxxxxxxxxxxx'access_token = 'dapib...

  • 895 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @Upendra_Dwivedi , This is potentially a missing package in your local Python setup, kindly can you check troubleshooting steps here and let me know  In the alternative this didn't work please share the output of the following commands: python ...

  • 0 kudos
Abser786
by New Contributor II
  • 1592 Views
  • 1 replies
  • 0 kudos

enable dynamic resource allocation on job cluster

I have a databricks job having two task those will run each alone or both parallel (will be controlled by if conditional task). When it runs parallel, one task is running for long time, but the same task finish quick when it runs alone. particularly ...

  • 1592 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @Abser786, There is a difference between Dynamic Resource Allocation and the Scheduler policy Dynamic Resource Allocation means getting more compute as needed if current compute is totally consumed, this can be achieved by autoscaling feature/c...

  • 0 kudos
filipniziol
by Esteemed Contributor
  • 5570 Views
  • 3 replies
  • 0 kudos

Any known issue with interactive Shared Cluster Driver Memory Cleanup

I am experiencing memory leaks on a Standard (formerly shared) interactive cluster: 1. We run jobs regularly on the cluster2. After each job completes, driver memory usage continues to increase, suggesting resources aren't fully released3. Eventually...

  • 5570 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello Team, I'll check internally if any known issue reported.

  • 0 kudos
2 More Replies
Vetrivel
by Databricks Partner
  • 1073 Views
  • 1 replies
  • 1 kudos

UC upgrade in Spark Streaming jobs

Kindly share the recommended approach for upgrading from HMS to UC for structured streaming jobs, ensuring seamless execution without any failures or data duplication? I would also appreciate insights into any best practices you have followed during ...

  • 1073 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Vetrivel,How are you doing today?, As per my understanding, Upgrading from Hive Metastore (HMS) to Unity Catalog (UC) for structured streaming jobs needs a careful approach to avoid failures or data duplication. The best way is to first pause all ...

  • 1 kudos
Yutaro
by New Contributor III
  • 813 Views
  • 1 replies
  • 1 kudos

Resolved! How can I efficiently remove backslashes during a COPY INTO load in Databricks?

I’m using Databricks’ COPY INTO to load data from a CSV file into a Delta table. My input CSV looks like this:  CSV filecolumn1(string),column2(string) "[\,\,111\,222\,]","012\"34"After running COPY INTO, my Delta table currently contains:column1(str...

  • 813 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Yutaro,You're doing great, and your question is very clear! In your case, the most efficient way to remove backslashes during the COPY INTO operation is to first load the raw CSV data into a temporary or staging Delta table, and then insert the cl...

  • 1 kudos
TimB
by New Contributor III
  • 1204 Views
  • 3 replies
  • 3 kudos

Adding dependencies to Serverless compute with concurrency slows processing right down

I am trying to run a job using the For Each command with many concurrent processes using serverless compute.To add dependencies to serverless jobs, it seems you have to add them to the notebook, rather than configure them on the tasks screen like you...

  • 1204 Views
  • 3 replies
  • 3 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 3 kudos

Yeah, TimB. Keep going.

  • 3 kudos
2 More Replies
glevin
by New Contributor II
  • 3309 Views
  • 7 replies
  • 1 kudos

JDBC Connection query row limit

Anyone know how to increase the amount of rows returned in a JDBC query? Currently we're receiving 1000 rows per query.Have tried adding a LIMIT 5000 to the end of the query, but no luck.

  • 3309 Views
  • 7 replies
  • 1 kudos
Latest Reply
glevin
New Contributor II
  • 1 kudos

Thanks all for your help.Looks like the bottleneck is the tool I'm using the make the connection (Appian). It limits JDBC responses to 1000 rows.

  • 1 kudos
6 More Replies
SaeedAsh
by New Contributor
  • 2557 Views
  • 3 replies
  • 0 kudos

How to Permanently Disable Serverless Compute in Azure Databricks?

Hi,I was wondering how to completely disable serverless compute in Azure Databricks. I am certain that it was disabled in my workspace before, but now it seems to be constantly available at the notebook level.Did Databricks release any recent updates...

  • 2557 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hey @noorbasha534 , I guess we dont have any feature to enable/disable databricks serverless compute at workspace level. You can confirm this once with your databricks account executive team. They might have a solution for this.

  • 0 kudos
2 More Replies
Yutaro
by New Contributor III
  • 3910 Views
  • 5 replies
  • 5 kudos

Resolved! Partitioning vs. Clustering for a 50 TiB Delta Lake Table on Databricks

Hello everyone,I’m planning to create a Delta Lake table on Databricks with an estimated size of ~50 TiB. The table includes three date columns — year, month, and day — and most of my queries will filter on these fields.I’m trying to decide whether t...

  • 3910 Views
  • 5 replies
  • 5 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 5 kudos

Hey Yutaro,Thank you so much for the kind words—it honestly means a lot! I'm really glad the guidance helped and that you're feeling more confident moving forward. You're doing all the right things by asking the right questions and planning ahead. If...

  • 5 kudos
4 More Replies
Rik
by New Contributor III
  • 11138 Views
  • 13 replies
  • 9 kudos

Resolved! File information is not passed to trigger job on file arrival

We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.Unfortunately, the trigger doesn't actually pass the file-path that is gener...

Data Engineering
file arrival
trigger file
Unity Catalog
  • 11138 Views
  • 13 replies
  • 9 kudos
Latest Reply
Panda
Valued Contributor
  • 9 kudos

@007  - Review the link https://community.databricks.com/t5/data-engineering/file-arrival-trigger/m-p/94069/highlight/true#M38808 

  • 9 kudos
12 More Replies
jano
by New Contributor III
  • 2295 Views
  • 1 replies
  • 0 kudos

Resolved! Run failed with termination code: RunExecutionError

I'm getting an error of RunExecutionError with no tasks having run in a notebook. The clusters spin up and then 5 mins later I am getting this error and all cells in the task notebook say cancelled. I don't see any issues with the clusters has they h...

  • 2295 Views
  • 1 replies
  • 0 kudos
Latest Reply
jano
New Contributor III
  • 0 kudos

This was due to a %run notebook command where the cluster could not locate the notebook. I was using a relative to root path from the github repo which did work when running the notebook book on a cluster but did not work when I put it into a job. Ho...

  • 0 kudos
rriley2
by New Contributor II
  • 5951 Views
  • 3 replies
  • 0 kudos

Resolved! Asset Bundles Email/Notifications Prod ONly

Howdy, I've got a job 'job1' and my dev/stg/prod target in my databricks.yamlCurrently, I have this configurationfor  my job:email_notifications: on_success: - me@myorg.com on_failure: - me@myorg.comwebhook_notifications: on_failure: - id: ${var.w...

  • 5951 Views
  • 3 replies
  • 0 kudos
Latest Reply
rriley2
New Contributor II
  • 0 kudos

Hmmm so something like this:targets: dev: resources: jobs: Workflow1: email_notifications: {} webhook_notifications: {} stage: resources: jobs: Workflow1: email_notifications: on_s...

  • 0 kudos
2 More Replies
LorenRD
by Contributor
  • 14612 Views
  • 15 replies
  • 11 kudos
  • 14612 Views
  • 15 replies
  • 11 kudos
Latest Reply
miranda_luna_db
Databricks Employee
  • 11 kudos

Hi folks - if you're unsure who your account team is and you're interested in the app delegated auth preview, please contact us via aibi-previews [at] databricks [dot] com

  • 11 kudos
14 More Replies
GFrost
by New Contributor
  • 1481 Views
  • 1 replies
  • 0 kudos

Passing values from a CTE (Common Table Expression) to user-defined functions (UDF) in Spark SQL

Hello everyone, I'm trying to pass a value from a CTE to my function (UDF). Unfortunately, it's not working.Here is the first variant: WITH fx_date_new AS (   SELECT CASE         WHEN '2025-01-01' > current_date()                THEN CAST(date_format...

  • 1481 Views
  • 1 replies
  • 0 kudos
Latest Reply
ggsmith
Contributor
  • 0 kudos

I think the issue is in your subquery. You shouldn't have the entire cte query in parentheses. Only  the column from your CTE. Your FROM clause is inside your udf arguments. See if you can use the example below to fix the issue.CREATE OR REPLACE FUNC...

  • 0 kudos
Labels