cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Abser786
by New Contributor II
  • 1256 Views
  • 1 replies
  • 0 kudos

enable dynamic resource allocation on job cluster

I have a databricks job having two task those will run each alone or both parallel (will be controlled by if conditional task). When it runs parallel, one task is running for long time, but the same task finish quick when it runs alone. particularly ...

  • 1256 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @Abser786, There is a difference between Dynamic Resource Allocation and the Scheduler policy Dynamic Resource Allocation means getting more compute as needed if current compute is totally consumed, this can be achieved by autoscaling feature/c...

  • 0 kudos
filipniziol
by Esteemed Contributor
  • 5360 Views
  • 3 replies
  • 0 kudos

Any known issue with interactive Shared Cluster Driver Memory Cleanup

I am experiencing memory leaks on a Standard (formerly shared) interactive cluster: 1. We run jobs regularly on the cluster2. After each job completes, driver memory usage continues to increase, suggesting resources aren't fully released3. Eventually...

  • 5360 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello Team, I'll check internally if any known issue reported.

  • 0 kudos
2 More Replies
Vetrivel
by Contributor
  • 951 Views
  • 1 replies
  • 1 kudos

UC upgrade in Spark Streaming jobs

Kindly share the recommended approach for upgrading from HMS to UC for structured streaming jobs, ensuring seamless execution without any failures or data duplication? I would also appreciate insights into any best practices you have followed during ...

  • 951 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Vetrivel,How are you doing today?, As per my understanding, Upgrading from Hive Metastore (HMS) to Unity Catalog (UC) for structured streaming jobs needs a careful approach to avoid failures or data duplication. The best way is to first pause all ...

  • 1 kudos
Yutaro
by New Contributor III
  • 682 Views
  • 1 replies
  • 1 kudos

Resolved! How can I efficiently remove backslashes during a COPY INTO load in Databricks?

I’m using Databricks’ COPY INTO to load data from a CSV file into a Delta table. My input CSV looks like this:  CSV filecolumn1(string),column2(string) "[\,\,111\,222\,]","012\"34"After running COPY INTO, my Delta table currently contains:column1(str...

  • 682 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Yutaro,You're doing great, and your question is very clear! In your case, the most efficient way to remove backslashes during the COPY INTO operation is to first load the raw CSV data into a temporary or staging Delta table, and then insert the cl...

  • 1 kudos
TimB
by New Contributor III
  • 935 Views
  • 3 replies
  • 3 kudos

Adding dependencies to Serverless compute with concurrency slows processing right down

I am trying to run a job using the For Each command with many concurrent processes using serverless compute.To add dependencies to serverless jobs, it seems you have to add them to the notebook, rather than configure them on the tasks screen like you...

  • 935 Views
  • 3 replies
  • 3 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 3 kudos

Yeah, TimB. Keep going.

  • 3 kudos
2 More Replies
glevin
by New Contributor II
  • 2680 Views
  • 7 replies
  • 1 kudos

JDBC Connection query row limit

Anyone know how to increase the amount of rows returned in a JDBC query? Currently we're receiving 1000 rows per query.Have tried adding a LIMIT 5000 to the end of the query, but no luck.

  • 2680 Views
  • 7 replies
  • 1 kudos
Latest Reply
glevin
New Contributor II
  • 1 kudos

Thanks all for your help.Looks like the bottleneck is the tool I'm using the make the connection (Appian). It limits JDBC responses to 1000 rows.

  • 1 kudos
6 More Replies
SaeedAsh
by New Contributor
  • 1978 Views
  • 3 replies
  • 0 kudos

How to Permanently Disable Serverless Compute in Azure Databricks?

Hi,I was wondering how to completely disable serverless compute in Azure Databricks. I am certain that it was disabled in my workspace before, but now it seems to be constantly available at the notebook level.Did Databricks release any recent updates...

  • 1978 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hey @noorbasha534 , I guess we dont have any feature to enable/disable databricks serverless compute at workspace level. You can confirm this once with your databricks account executive team. They might have a solution for this.

  • 0 kudos
2 More Replies
Yutaro
by New Contributor III
  • 2681 Views
  • 5 replies
  • 5 kudos

Resolved! Partitioning vs. Clustering for a 50 TiB Delta Lake Table on Databricks

Hello everyone,I’m planning to create a Delta Lake table on Databricks with an estimated size of ~50 TiB. The table includes three date columns — year, month, and day — and most of my queries will filter on these fields.I’m trying to decide whether t...

  • 2681 Views
  • 5 replies
  • 5 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 5 kudos

Hey Yutaro,Thank you so much for the kind words—it honestly means a lot! I'm really glad the guidance helped and that you're feeling more confident moving forward. You're doing all the right things by asking the right questions and planning ahead. If...

  • 5 kudos
4 More Replies
Rik
by New Contributor III
  • 10396 Views
  • 13 replies
  • 9 kudos

Resolved! File information is not passed to trigger job on file arrival

We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.Unfortunately, the trigger doesn't actually pass the file-path that is gener...

Data Engineering
file arrival
trigger file
Unity Catalog
  • 10396 Views
  • 13 replies
  • 9 kudos
Latest Reply
Panda
Valued Contributor
  • 9 kudos

@007  - Review the link https://community.databricks.com/t5/data-engineering/file-arrival-trigger/m-p/94069/highlight/true#M38808 

  • 9 kudos
12 More Replies
jano
by New Contributor III
  • 1745 Views
  • 1 replies
  • 0 kudos

Resolved! Run failed with termination code: RunExecutionError

I'm getting an error of RunExecutionError with no tasks having run in a notebook. The clusters spin up and then 5 mins later I am getting this error and all cells in the task notebook say cancelled. I don't see any issues with the clusters has they h...

  • 1745 Views
  • 1 replies
  • 0 kudos
Latest Reply
jano
New Contributor III
  • 0 kudos

This was due to a %run notebook command where the cluster could not locate the notebook. I was using a relative to root path from the github repo which did work when running the notebook book on a cluster but did not work when I put it into a job. Ho...

  • 0 kudos
rriley2
by New Contributor II
  • 4301 Views
  • 3 replies
  • 0 kudos

Resolved! Asset Bundles Email/Notifications Prod ONly

Howdy, I've got a job 'job1' and my dev/stg/prod target in my databricks.yamlCurrently, I have this configurationfor  my job:email_notifications: on_success: - me@myorg.com on_failure: - me@myorg.comwebhook_notifications: on_failure: - id: ${var.w...

  • 4301 Views
  • 3 replies
  • 0 kudos
Latest Reply
rriley2
New Contributor II
  • 0 kudos

Hmmm so something like this:targets: dev: resources: jobs: Workflow1: email_notifications: {} webhook_notifications: {} stage: resources: jobs: Workflow1: email_notifications: on_s...

  • 0 kudos
2 More Replies
LorenRD
by Contributor
  • 13430 Views
  • 15 replies
  • 11 kudos
  • 13430 Views
  • 15 replies
  • 11 kudos
Latest Reply
miranda_luna_db
Databricks Employee
  • 11 kudos

Hi folks - if you're unsure who your account team is and you're interested in the app delegated auth preview, please contact us via aibi-previews [at] databricks [dot] com

  • 11 kudos
14 More Replies
GFrost
by New Contributor
  • 1040 Views
  • 1 replies
  • 0 kudos

Passing values from a CTE (Common Table Expression) to user-defined functions (UDF) in Spark SQL

Hello everyone, I'm trying to pass a value from a CTE to my function (UDF). Unfortunately, it's not working.Here is the first variant: WITH fx_date_new AS (   SELECT CASE         WHEN '2025-01-01' > current_date()                THEN CAST(date_format...

  • 1040 Views
  • 1 replies
  • 0 kudos
Latest Reply
ggsmith
Contributor
  • 0 kudos

I think the issue is in your subquery. You shouldn't have the entire cte query in parentheses. Only  the column from your CTE. Your FROM clause is inside your udf arguments. See if you can use the example below to fix the issue.CREATE OR REPLACE FUNC...

  • 0 kudos
samanthacr
by New Contributor II
  • 2867 Views
  • 4 replies
  • 0 kudos

How to use Iceberg SQL Extensions in a notebook?

I'm trying to use Iceberg's SQL extensions in my Databricks Notebook, but I get a syntax error. Specifically, I'm trying to run 'ALTER TABLE my_iceberg_table WRITE LOCALLY ORDERED BY timestamp;'. This command is listed as part of Iceberg's SQL extens...

  • 2867 Views
  • 4 replies
  • 0 kudos
Latest Reply
gorkaada_BI
New Contributor II
  • 0 kudos

val dfh = spark.sql(s"""CALL glue_catalog.system.create_changelog_view(  table => '<>table',  options => map('start-snapshot-id', '$startSnapshotId', 'end-snapshot-id', '$endSnapshotId'),  changelog_view => table_v)""")lead to ParseException: [PROCED...

  • 0 kudos
3 More Replies
User16790091296
by Databricks Employee
  • 4388 Views
  • 3 replies
  • 5 kudos

Resolved! How do I use databricks-cli without manual configuration

I want to use databricks cli:databricks clusters listbut this requires a manual step that requires interactive work with the user:databricks configure --tokenIs there a way to use databricks cli without manual intervention so that you can run it as p...

  • 4388 Views
  • 3 replies
  • 5 kudos
Latest Reply
alexott
Databricks Employee
  • 5 kudos

You can set two environment variables: DATABRICKS_HOST and DATABRICKS_TOKEN, and databricks-cli will use them. See the example of that in the DevOps pipelinesee the full list of environment variables at the end of the Authentication section of docume...

  • 5 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels