cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kenmyers-8451
by Contributor II
  • 1861 Views
  • 1 replies
  • 0 kudos

Resolved! Is it possible to selectively overwrite data in a partitioned delta table with a sql warehouse

My team has a workflow that currently runs with databricks sql using a standalone cluster. We are trying to switch this job to using a sql warehouse but I keep getting errors. The current job runs in a for-each loop to break up the work into smaller ...

  • 1861 Views
  • 1 replies
  • 0 kudos
Latest Reply
kenmyers-8451
Contributor II
  • 0 kudos

Teammate helped me realize that `replace where` doesn't work with insert overwrite, but does work with insert into.

  • 0 kudos
help_needed_445
by Contributor
  • 3160 Views
  • 2 replies
  • 2 kudos

Resolved! Notebook cell won't finish running or cancelling. Interrupt button greyed out.

A cell in a notebook that is using the %run magic command to run another notebook was running for what I considered too long and so I clicked the interrupt button and now the button is greyed out and giving a spinning circle.The interrupt button at t...

  • 3160 Views
  • 2 replies
  • 2 kudos
Latest Reply
Advika
Community Manager
  • 2 kudos

Hello @help_needed_445! Are you still experiencing this issue? You can try restarting your cluster to force-stop any ongoing processes. If that doesn’t resolve the problem, detaching and reattaching the notebook to the cluster might help.

  • 2 kudos
1 More Replies
pavel_merkle
by Databricks Partner
  • 16966 Views
  • 6 replies
  • 0 kudos

Databrikcs SDK - create new job using JSON

Hello,I am trying to create a Job via Databricks SDK. As input, I use the JSON generated via Workflows UI (Worklflows->Jobs->View YAML/JSON->JSON API->Create) generating pavel_job.json. When trying to run SDK function jobs.create asdbk = WorkspaceCli...

  • 16966 Views
  • 6 replies
  • 0 kudos
Latest Reply
mike933
New Contributor II
  • 0 kudos

This is probably the easiest way to create job from JSON:import json from databricks.sdk import WorkspaceClient from databricks.sdk.service.jobs import CreateJob client = WorkspaceClient( host=WORKSPACE_DICT[WORKSPACE_NAME]["host_name"], token=...

  • 0 kudos
5 More Replies
abhishekv5055
by Databricks Partner
  • 1129 Views
  • 1 replies
  • 0 kudos

Not able to login to Partner Academy

I am not able to login to databricks partner academy. Also I raised a ticket (ticket id: 00670650) for the support team. Can someone please help me in resolving the issue? 

abhishekv5055_0-1747913131093.png
Data Engineering
Partner Academy
  • 1129 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @abhishekv5055! The support team takes around 24-48 hours to review and respond to submitted tickets. They will follow up with you directly through the ticket you’ve raised. We appreciate your patience in the meantime.

  • 0 kudos
CashyMcSmashy
by Databricks Partner
  • 2436 Views
  • 3 replies
  • 0 kudos

Databricks Asset Bundles Firewall Issue

HelloI'm trying to use Databricks Asset Bundles within a network that has limited access to the internet.  When I try to deploy I get the error message "error downloading Terraform: Get "https://releases.hashicorp.com/terraform/1.5.5/index.json".  Is...

  • 2436 Views
  • 3 replies
  • 0 kudos
Latest Reply
CashyMcSmashy
Databricks Partner
  • 0 kudos

HelloAfter testing the following urls need to be whitelisted on the firewall...   - [Terraform Registry](https://registry.terraform.io)  - [Terrafrom Checkpoint API](https://checkpoint-api.hashicorp.com)  - [Terraform Releases](https://releases.hashi...

  • 0 kudos
2 More Replies
mgcasas
by New Contributor
  • 716 Views
  • 1 replies
  • 0 kudos

S3 Private Connection from Databricks Serverless Workspace

I'm looking for reference to privately connect to an S3 bucket from a Serverless Workspace deployed on the same region.

  • 716 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @mgcasas! To enable private access from a Serverless Workspace to an S3 bucket in the same region, you can use an AWS Gateway Endpoint. Specifically, create an S3 Gateway Endpoint in your AWS VPC to allow direct and secure connectivity without ...

  • 0 kudos
Shiva4266
by New Contributor II
  • 1434 Views
  • 2 replies
  • 0 kudos

IP address configuration for databricks workspace

Hi All Im running the below code in notebook but always returns 400 error even though everything seems correct, Can you please help me how to see the current IP address list or enable the IP access list for the databricks workspaceyour help will be a...

  • 1434 Views
  • 2 replies
  • 0 kudos
Latest Reply
Shiva4266
New Contributor II
  • 0 kudos

HI @Shua42  - I tried installing the CLI and tried providing the necessary commands from the below documentation, but workspace commands are not working as workspace -conf is not working , can you help here Link - Configure IP access lists for worksp...

  • 0 kudos
1 More Replies
VicS
by Databricks Partner
  • 3479 Views
  • 5 replies
  • 1 kudos

Creating an Azure-Keyvault-backed secret scope with terraform

We want to create an Azure-Keyvault-backed secret scope with terraform - while we are able to do it via the UI with the URL https://adb-xxxxxxxx.x.azuredatabricks.net/?o=xxxxxxxxxxxxxx#secrets/createScope, I'm unable to do it with Terraform.  resourc...

  • 3479 Views
  • 5 replies
  • 1 kudos
Latest Reply
J-Bradlee
New Contributor II
  • 1 kudos

I am also having the same issue. I am deploying the Azure backed secrets across 3 different workspaces in my TF deployment. Strangley enough it works for 2/3 of my deployments but then I get the same error: Scope with Azure KeyVault must have userAAD...

  • 1 kudos
4 More Replies
dperkins
by New Contributor
  • 3130 Views
  • 1 replies
  • 0 kudos

NoSuchMethodError with Delta-spark and Databricks Runtime 16.4 LTS

I'm running into an exception when trying to run a Java Spark Jar using the delta-spark library as a job on a Databricks Runtime 16.4 LTS cluster on Azure. I've tried various versions of the Delta Spark library from 3.0.0 to the latest 3.3.1, but alw...

  • 3130 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @dperkins The error is likely due to a mismatch between your delta-spark library and the Databricks Runtime. The runtime uses Spark 3.5.2 and Scala 2.13 (with Delta Lake 3.3.1), so ensure your JAR is built with these versions.It’s best to remove t...

  • 0 kudos
Luca_dall
by New Contributor
  • 802 Views
  • 1 replies
  • 0 kudos

Delta Live Table - Delta Table Sink Error after the first run

Hello,I'm trying to create a delta table sink to store delete requests coming to our system that are ingested in the bronze layer successfully with an autoloader. As I want to have a delete control table that needs to be updated after the data deleti...

  • 802 Views
  • 1 replies
  • 0 kudos
Latest Reply
Shua42
Databricks Employee
  • 0 kudos

Hi @Luca_dall , It looks like this is trying to create the delta_sink_flow table each time the pipeline is run, where the creation of it should be managed by DLT. You can try removing the create_sink call and just run the append_flow to handle the ta...

  • 0 kudos
User16783853501
by Databricks Employee
  • 4643 Views
  • 4 replies
  • 0 kudos

best practice for optimizedWrites and Optimize

What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?  

  • 4643 Views
  • 4 replies
  • 0 kudos
Latest Reply
rajkve
New Contributor II
  • 0 kudos

Hi All,Can anyone who has solved this challenge confirm if the below increases write latency and avoids creating smaller file, based a POC I did, I dont see that behaviour replicable, so I am just wondering. Many thanks.  

  • 0 kudos
3 More Replies
aswinvishnu
by New Contributor II
  • 1232 Views
  • 2 replies
  • 1 kudos

Resolved! Avoiding metadata information when sending data to GCS

Hi all,I have use case where I need to push the table data to GCS bucket,query = "${QUERY}" df = spark.sql(query) gcs_path = "${GCS_PATH}" df.write.option("maxRecordsPerFile", int("${MAX_RECORDS_PER_FILE}")).mode("${MODE}").json(gcs_path)This can ...

  • 1232 Views
  • 2 replies
  • 1 kudos
Latest Reply
aswinvishnu
New Contributor II
  • 1 kudos

Thanks a lot @cgrant . This removed   '_started_...' , '_committed_..', but still generated _SUCCESS file.spark.conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")this removed _SUCCESS files also.

  • 1 kudos
1 More Replies
bjn
by New Contributor III
  • 2703 Views
  • 5 replies
  • 1 kudos

Resolved! Trigger bad records in databricks

I use bad records while reading a csv as follows:df = spark.read.format("csv") .schema(schema) .option("badRecordsPath", bad_records_path) Since bad records are not written immediately, I want to know how can trigger the write...

  • 2703 Views
  • 5 replies
  • 1 kudos
Latest Reply
bjn
New Contributor III
  • 1 kudos

I found the problem why the code didn't trigger the bad records write. I did empty the folder for bad records. After fixing that, it works. Thanks for the help Isi data_frame.write.format("delta").option("optimizeWrite", "true").mode( "ov...

  • 1 kudos
4 More Replies
Upendra_Dwivedi
by Databricks Partner
  • 1870 Views
  • 3 replies
  • 0 kudos

databricks_sql_connector not connecting

Hi All,I am trying to connect to sql warehouse using a databricks_oauth auth type using databricks-sql-connector.from databricks.sql import connect conn = connect( server_hostname="https://adb-xxxxxxxxxxxxxx.azuredatabricks.net/", http_path=...

  • 1870 Views
  • 3 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@Upendra_Dwivedi Have you installed the databricks-sdk dependency?https://docs.databricks.com/aws/en/dev-tools/python-sql-connector

  • 0 kudos
2 More Replies
rammy
by Contributor III
  • 13300 Views
  • 6 replies
  • 5 kudos

How I could read the Job id, run id and parameters in python cell?

I have tried following ways to get job parameters but none of the things are working.runId='{{run_id}}' jobId='{{job_id}}' filepath='{{filepath}}' print(runId," ",jobId," ",filepath) r1=dbutils.widgets.get('{{run_id}}') f1=dbutils.widgets.get('{{file...

  • 13300 Views
  • 6 replies
  • 5 kudos
Latest Reply
Siete
New Contributor II
  • 5 kudos

You should use {{job.id}} and {{job.run_id}} instead of with an underscore. This works for me.

  • 5 kudos
5 More Replies
Labels