cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anand13
by New Contributor II
  • 1352 Views
  • 2 replies
  • 0 kudos

Getting concurrent issue on delta table using liquid clustering

In our project, we are testing liquid clustering using a test table called status_update, where we need to update the status for different market IDs. We are attempting to update the status_update table in parallel using the UPDATE command.ALTER TABL...

  • 1352 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anand13
New Contributor II
  • 0 kudos

@Walter_C We are using Liquid Clustering as our first strategy. Our Databricks Runtime is 13.3, and we have a table named status_update containing approximately 30 market IDs, each with a single record. In our pipeline, if any market fails, we need t...

  • 0 kudos
1 More Replies
amarnadh-gadde
by New Contributor II
  • 2101 Views
  • 6 replies
  • 0 kudos

Default catalog created wrong on my workspace

We have provisioned a new databricks account and workspace on premium plan. When built out workspace using terraform, we expected to see a default catalog matching workspace name as per this documentation. However I dont see it. All I see are the 3 c...

amarnadhgadde_0-1738260452789.png amarnadhgadde_1-1738260467287.png amarnadhgadde_2-1738260473392.png
  • 2101 Views
  • 6 replies
  • 0 kudos
Latest Reply
loic
Contributor
  • 0 kudos

Hello,Meanwhile I try to get help on Databricks default catalog behavior, I found this topic.If I can give my advice here, one reason I see for the behavior of @amarnadh-gadde is that you deployed your new workspace in a region where there is already...

  • 0 kudos
5 More Replies
seefoods
by Valued Contributor
  • 2949 Views
  • 6 replies
  • 7 kudos

Resolved! autoloader strategy write ( APPEND, MERGE, UPDATE, COMPLETE, OVERWRITE)

Hello Guys,  I want to know if operations like overwrite, merge, update in static write its the same when we using autoloader. I'm so confusing about the behavior of mode like ( complete, update and append). After that, i want to know what its the co...

  • 2949 Views
  • 6 replies
  • 7 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 7 kudos

Thanks for discussion. I have a tiny suggestion. Based on my experience working with streaming loads, I often find the checkpoint location hard enough to actually check the offset information or delete that directory for fresh load of data. Hence I h...

  • 7 kudos
5 More Replies
SatyaKoduri
by New Contributor II
  • 1555 Views
  • 1 replies
  • 1 kudos

Resolved! Yaml file to Dataframe

Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schema—allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks ...

Screenshot 2025-06-10 at 10.36.20.png
  • 1555 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @SatyaKoduri This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.Here are a few solution...

  • 1 kudos
tuckera
by New Contributor
  • 477 Views
  • 1 replies
  • 0 kudos

Governance in pipelines

How does everyone track and deploy their pipelines and generated data assets? DABs? Terraform? Manual? Something else entirely?

  • 477 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @tuckera The data engineering landscape shows a pretty diverse mix of approaches for tracking and deploying pipelines and data assets, often varying by company size, maturity, and specific needs.Infrastructure as Code (IaC) tools like Terraform an...

  • 0 kudos
Edoa
by New Contributor
  • 1348 Views
  • 1 replies
  • 0 kudos

SFTP Connection Timeout on Job Cluster but Works on Serverless Compute

Hi all,I'm experiencing inconsistent behavior when connecting to an SFTP server using Paramiko in Databricks.When I run the code on Serverless Compute, the connection to xxx.yyy.com via SFTP works correctly.When I run the same code on a Job Cluster, ...

  • 1348 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Edoa This is a common networking issue in Databricks related to the different network configurations between Serverless Compute and Job Clusters.Here are the key differences and potential solutions:Root CauseServerless Compute runs in Databricks'...

  • 0 kudos
oeztuerk82
by New Contributor II
  • 1218 Views
  • 2 replies
  • 3 kudos

Deletion of Resource Group on Azure and Impact on Databricks Workspace

Hello together,I would like to confirm the data retention and deletion behavior associated with an Azure Databricks workspace, particularly in the context of deleting an Azure resource group where a Databricks Workspace lays in.Recently, I deleted an...

  • 1218 Views
  • 2 replies
  • 3 kudos
Latest Reply
SAKBAR
New Contributor II
  • 3 kudos

Resource group once deleted cannot be recovered as like ADLS, so not possible to restore workspace or any resource under the resource group. May be Microsoft support can recover if under premium plan with them. For future perspective it is always bet...

  • 3 kudos
1 More Replies
DarioB
by New Contributor III
  • 1627 Views
  • 1 replies
  • 1 kudos

Resolved! DAB for_each_task - Passing task values

I am trying to deploy a job with a for_each_task using DAB and Terraform and I am unable to properly pass the task value into the subsequent task.These are my job tasks definition in the YAML:      tasks:        - task_key: FS_batching          job_c...

  • 1627 Views
  • 1 replies
  • 1 kudos
Latest Reply
DarioB
New Contributor III
  • 1 kudos

We have been testing and find out the issue (I just realized that my anonymization of the names removed the source of the error).We have tracked down to the inputs parameter of the for_each_task. It seems that is unable to call to task names with das...

  • 1 kudos
alonisser
by Contributor II
  • 1524 Views
  • 4 replies
  • 0 kudos

Controlling the name of the downloaded csv file from a notebook

I got a notebook with multiple display() commands in various cells, the users are currently downloading the result csv from each cellI want the downloads to be named after the name of the cell (or any other methods I can make each download have a dif...

  • 1524 Views
  • 4 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @alonisser Once the file is stored in the volume —whether in S3, GCS, or ADLS— you’ll be able to see it with a custom name defined by the customer or project. Additionally, the files may be saved in different folders, making it easier to identify...

  • 0 kudos
3 More Replies
korasino
by New Contributor II
  • 1185 Views
  • 2 replies
  • 0 kudos

Photon and Predictive I/O vs. Liquid Clustering

Hi Quick question about optimizing our Delta tables. Photon and Predictive I/O vs. Liquid Clustering (LC).We have UUIDv4 columns (random, high-cardinality) used in both WHERE uuid = … filters and joins. From what I understand Photon (on Serverless wa...

  • 1185 Views
  • 2 replies
  • 0 kudos
Latest Reply
korasino
New Contributor II
  • 0 kudos

Hey, thanks for the reply. Could you share some documentation links around those bullet points in your answer? thanks!

  • 0 kudos
1 More Replies
seefoods
by Valued Contributor
  • 1581 Views
  • 3 replies
  • 1 kudos

Resolved! build autoloader pyspark job

Hello Guys, I have build an ETL in pyspark which use autolaoder. So i want to know what is best way to use autoader databricks? What is the best way to vaccum checkpoint files on /Volumes ? Hope to have your ideas about that. Cordially , 

  • 1581 Views
  • 3 replies
  • 1 kudos
Latest Reply
seefoods
Valued Contributor
  • 1 kudos

Hello @intuz , Thanks for your reply. Cordially 

  • 1 kudos
2 More Replies
yathish
by New Contributor II
  • 3877 Views
  • 6 replies
  • 0 kudos

upstream request timeout in databricks apps when using databricks sql connector

Hi, i am building an application in Databricks apps, Sometimes when i try to fetch data using Databricks SQL connector in an API, it takes time to hit the SQL warehouse and if the time exceeds more than 60 seconds it gives upstream timeout error. I h...

  • 3877 Views
  • 6 replies
  • 0 kudos
Latest Reply
epistoteles
New Contributor II
  • 0 kudos

@Alberto_Umana Any news on this? I am having similar issues and am also using a (running) serverless SQL warehouse.

  • 0 kudos
5 More Replies
EndreM
by New Contributor III
  • 2691 Views
  • 8 replies
  • 1 kudos

Replay a stream after converting to liquid cluster failes

I have problem replaying a stream.I need to replay it because conversion from liquid clusterto partition doesnt work. I see a lot of garbage collectionand memory maxes out immediatly. Then the driver restarts.TO debug the problem I try to force only ...

  • 2691 Views
  • 8 replies
  • 1 kudos
Latest Reply
EndreM
New Contributor III
  • 1 kudos

After increasing the compute to one with 500 GB memory, the job was able to transfer ca 300 GB of data, but it produced a large amount of files, 26000. While the old table with partition and no liquid cluster had 4000 files with a total of 1.2 TB of ...

  • 1 kudos
7 More Replies
anilsampson
by New Contributor III
  • 771 Views
  • 1 replies
  • 0 kudos

Resolved! databricks dashboard via workflow task.

Hello, i am trying to trigger a databricks dashboard via workflow task.1.when i deploy the job triggering the dashboard task via local "Deploy bundle" command deployment is successful.2. when i try to deploy to a different environment via CICD while ...

  • 771 Views
  • 1 replies
  • 0 kudos
Latest Reply
anilsampson
New Contributor III
  • 0 kudos

i think i figured out the issue , it had to do with the version of cli, updated the CICD to use latest version of clicurl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh

  • 0 kudos
arendon
by New Contributor II
  • 1361 Views
  • 2 replies
  • 1 kudos

Resolved! Asset Bundles: How to mute job failure notifications until final retry?

I'm trying to configure a job to only send failure notifications on the final retry failure (not on intermediate retry failures). This feature is available in the Databricks UI as "Mute notifications until the last retry", but I can't get this to wor...

  • 1361 Views
  • 2 replies
  • 1 kudos
Latest Reply
arendon
New Contributor II
  • 1 kudos

Thank you for the response, @lingareddy_Alva!I'll take a look at the workarounds you shared. 

  • 1 kudos
1 More Replies
Labels