cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Klusener
by Contributor
  • 525 Views
  • 1 replies
  • 3 kudos

Handling partition overwrite in Liquid Clustering

Hello,Currently we have delta tables in TBs partitioned by year, month, day. We perform dynamic partition overwrite using partitionOverwriteMode  as dynamic to handle rerun/corrections.With liquid clustering, since explicit partitions are not require...

  • 525 Views
  • 1 replies
  • 3 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 3 kudos

Hi @Klusener Good day!!Dynamic partition overwrites only supports selective overwrites for partitioned columns, not for liquid clustering or regular columns.If you know the exact predicates, use replaceWhere. Note: This is not possible without knowin...

  • 3 kudos
Malthe
by Contributor
  • 570 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to add primary key constraint to nullable identity column

While we can in fact define a primary key during table creation for an identity column that's nullable (i.e., not constrained using NOT NULL), it's not possible to add such a primary key constraint after the table has been created.We get an error mes...

  • 570 Views
  • 1 replies
  • 1 kudos
Latest Reply
amuchoudhary
New Contributor III
  • 1 kudos

Creating a table with a nullable IDENTITY column and defining the primary key at creation time works.The database quietly interprets the column as NOT NULL for the purposes of the primary key, even though it's technically defined as nullable (i.e., n...

  • 1 kudos
mohdluqmancse88
by New Contributor
  • 281 Views
  • 1 replies
  • 0 kudos

Databricks on Azure

We are setting up data hubs that interacts with each other for Gen AI use cases. I want to prove that catalog sharing works across azure subscriptions if all UCs are mapped to the same metastore. Can you point me to the right documentation?

  • 281 Views
  • 1 replies
  • 0 kudos
Latest Reply
Gopichand_G
New Contributor II
  • 0 kudos

I believe you need to follow below steps.1. Deploying a metastore in one region.2. Linking each workspace (from different Azure subscriptions but same tenant and region) to it.3. Then validating that metadata objects like catalogs, schemas, and table...

  • 0 kudos
MaartenH
by New Contributor III
  • 1906 Views
  • 9 replies
  • 4 kudos

Lakehouse federation for SQL server: database name with spaces

We're currently using lakehouse federation for various sources (Snowflake, SQL Server); usually succesful. However we've encountered a case where one of the databases on the SQL Server has spaces in its name, e.g. 'My Database Name'. We've tried vari...

  • 1906 Views
  • 9 replies
  • 4 kudos
Latest Reply
SAKBAR
New Contributor II
  • 4 kudos

I am having the similar issue where table name is having spaces and I cannot see those tables in the foreign catalog.schema in databricks. It seems that the Lakehouse Federation not supporting spaces in database name and table name, however col name ...

  • 4 kudos
8 More Replies
Anand13
by New Contributor II
  • 672 Views
  • 2 replies
  • 0 kudos

Getting concurrent issue on delta table using liquid clustering

In our project, we are testing liquid clustering using a test table called status_update, where we need to update the status for different market IDs. We are attempting to update the status_update table in parallel using the UPDATE command.ALTER TABL...

  • 672 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anand13
New Contributor II
  • 0 kudos

@Walter_C We are using Liquid Clustering as our first strategy. Our Databricks Runtime is 13.3, and we have a table named status_update containing approximately 30 market IDs, each with a single record. In our pipeline, if any market fails, we need t...

  • 0 kudos
1 More Replies
amarnadh-gadde
by New Contributor II
  • 1172 Views
  • 6 replies
  • 0 kudos

Default catalog created wrong on my workspace

We have provisioned a new databricks account and workspace on premium plan. When built out workspace using terraform, we expected to see a default catalog matching workspace name as per this documentation. However I dont see it. All I see are the 3 c...

amarnadhgadde_0-1738260452789.png amarnadhgadde_1-1738260467287.png amarnadhgadde_2-1738260473392.png
  • 1172 Views
  • 6 replies
  • 0 kudos
Latest Reply
loic
Contributor
  • 0 kudos

Hello,Meanwhile I try to get help on Databricks default catalog behavior, I found this topic.If I can give my advice here, one reason I see for the behavior of @amarnadh-gadde is that you deployed your new workspace in a region where there is already...

  • 0 kudos
5 More Replies
seefoods
by Contributor III
  • 1202 Views
  • 6 replies
  • 7 kudos

Resolved! autoloader strategy write ( APPEND, MERGE, UPDATE, COMPLETE, OVERWRITE)

Hello Guys,  I want to know if operations like overwrite, merge, update in static write its the same when we using autoloader. I'm so confusing about the behavior of mode like ( complete, update and append). After that, i want to know what its the co...

  • 1202 Views
  • 6 replies
  • 7 kudos
Latest Reply
chanukya-pekala
Contributor II
  • 7 kudos

Thanks for discussion. I have a tiny suggestion. Based on my experience working with streaming loads, I often find the checkpoint location hard enough to actually check the offset information or delete that directory for fresh load of data. Hence I h...

  • 7 kudos
5 More Replies
SatyaKoduri
by New Contributor II
  • 802 Views
  • 1 replies
  • 1 kudos

Resolved! Yaml file to Dataframe

Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schema—allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks ...

Screenshot 2025-06-10 at 10.36.20.png
  • 802 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @SatyaKoduri This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.Here are a few solution...

  • 1 kudos
tuckera
by New Contributor
  • 262 Views
  • 1 replies
  • 0 kudos

Governance in pipelines

How does everyone track and deploy their pipelines and generated data assets? DABs? Terraform? Manual? Something else entirely?

  • 262 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @tuckera The data engineering landscape shows a pretty diverse mix of approaches for tracking and deploying pipelines and data assets, often varying by company size, maturity, and specific needs.Infrastructure as Code (IaC) tools like Terraform an...

  • 0 kudos
Edoa
by New Contributor
  • 583 Views
  • 1 replies
  • 0 kudos

SFTP Connection Timeout on Job Cluster but Works on Serverless Compute

Hi all,I'm experiencing inconsistent behavior when connecting to an SFTP server using Paramiko in Databricks.When I run the code on Serverless Compute, the connection to xxx.yyy.com via SFTP works correctly.When I run the same code on a Job Cluster, ...

  • 583 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Edoa This is a common networking issue in Databricks related to the different network configurations between Serverless Compute and Job Clusters.Here are the key differences and potential solutions:Root CauseServerless Compute runs in Databricks'...

  • 0 kudos
oeztuerk82
by New Contributor
  • 513 Views
  • 2 replies
  • 0 kudos

Deletion of Resource Group on Azure and Impact on Databricks Workspace

Hello together,I would like to confirm the data retention and deletion behavior associated with an Azure Databricks workspace, particularly in the context of deleting an Azure resource group where a Databricks Workspace lays in.Recently, I deleted an...

  • 513 Views
  • 2 replies
  • 0 kudos
Latest Reply
SAKBAR
New Contributor II
  • 0 kudos

Resource group once deleted cannot be recovered as like ADLS, so not possible to restore workspace or any resource under the resource group. May be Microsoft support can recover if under premium plan with them. For future perspective it is always bet...

  • 0 kudos
1 More Replies
LeoGriffM
by New Contributor II
  • 1714 Views
  • 1 replies
  • 0 kudos

Zip archive with PowerShell "Error: The zip file may not be valid or may be an unsupported version."

Zip archive "Error: The zip file may not be valid or may be an unsupported version."We are trying to upload a ZIP archive to a Databricks workspace for faster and atomic uploads of artifacts. The expected behaviour is that we can run the following co...

  • 1714 Views
  • 1 replies
  • 0 kudos
Latest Reply
LeoGriffM
New Contributor II
  • 0 kudos

May relate to feature request https://github.com/databricks/cli/issues/1221

  • 0 kudos
DarioB
by New Contributor III
  • 824 Views
  • 1 replies
  • 1 kudos

Resolved! DAB for_each_task - Passing task values

I am trying to deploy a job with a for_each_task using DAB and Terraform and I am unable to properly pass the task value into the subsequent task.These are my job tasks definition in the YAML:      tasks:        - task_key: FS_batching          job_c...

  • 824 Views
  • 1 replies
  • 1 kudos
Latest Reply
DarioB
New Contributor III
  • 1 kudos

We have been testing and find out the issue (I just realized that my anonymization of the names removed the source of the error).We have tracked down to the inputs parameter of the for_each_task. It seems that is unable to call to task names with das...

  • 1 kudos
alonisser
by Contributor II
  • 939 Views
  • 4 replies
  • 0 kudos

Controlling the name of the downloaded csv file from a notebook

I got a notebook with multiple display() commands in various cells, the users are currently downloading the result csv from each cellI want the downloads to be named after the name of the cell (or any other methods I can make each download have a dif...

  • 939 Views
  • 4 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor II
  • 0 kudos

Hey @alonisser Once the file is stored in the volume —whether in S3, GCS, or ADLS— you’ll be able to see it with a custom name defined by the customer or project. Additionally, the files may be saved in different folders, making it easier to identify...

  • 0 kudos
3 More Replies
korasino
by New Contributor II
  • 712 Views
  • 2 replies
  • 0 kudos

Photon and Predictive I/O vs. Liquid Clustering

Hi Quick question about optimizing our Delta tables. Photon and Predictive I/O vs. Liquid Clustering (LC).We have UUIDv4 columns (random, high-cardinality) used in both WHERE uuid = … filters and joins. From what I understand Photon (on Serverless wa...

  • 712 Views
  • 2 replies
  • 0 kudos
Latest Reply
korasino
New Contributor II
  • 0 kudos

Hey, thanks for the reply. Could you share some documentation links around those bullet points in your answer? thanks!

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels