cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

IGRACH
by New Contributor III
  • 1094 Views
  • 2 replies
  • 1 kudos

Resolved! Specifing Output mode and Path when using For Each Batch

Since .foreachBatch() is "hijacking" the stream and executing arbitrary code in it, do I need to specify Output mode and Path:(df.writeStream .format("delta") .trigger(availableNow = True) .option("checkpointLocation", "check_point_location") .forea...

  • 1094 Views
  • 2 replies
  • 1 kudos
Latest Reply
Branislav
New Contributor II
  • 1 kudos

Thanks xD

  • 1 kudos
1 More Replies
ClarkElliott
by New Contributor
  • 4346 Views
  • 1 replies
  • 0 kudos

Parquet file for delta streaming live table with pipeline

I am having an issue with parquet files:   I'm getting Illegal Parquet type: INT64 (TIMESTAMP(NANOS,false)) error while trying to read a parquet file (generated outside of DataBricks).  I am using a Delta streaming live table with a pipeline.  If I r...

  • 4346 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @ClarkElliott  Good day!! Cause Databricks Runtime versions 11.3 LTS and above do not support the TIMESTAMP_NANOS type in open source Apache Spark and Databricks Runtime. If a Parquet file contains fields with the TIMESTAMP_NANOS type, attempts to...

  • 0 kudos
shavya
by New Contributor
  • 5118 Views
  • 1 replies
  • 0 kudos

Where are default temporary checkpoint locations created for streaming queries with display command?

Hello!I created a streaming query using Auto Loader to read data from S3 and used display command to see if the query was working. Initially, cloudFiles.includeExistingFiles was set to True, but since we have data in Glacier that needs to be retrieve...

  • 5118 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @shavya  Good day!! When you do not specify a checkpointLocation in a streaming query in Databricks. It uses a temporary system directory such as:     dbfs:/local_disk0/tmp/temporary-<random_uuid>   To remove the temporary checkpoint, please ...

  • 0 kudos
lprevost
by Contributor III
  • 1258 Views
  • 1 replies
  • 0 kudos

Streaming query error - [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA]

[STREAM_FAILED] Query [id = 6a821fbc-490b-4ad8-891d-e4cacc2af1d6, runId = e055fede-8012-4369-861b-47183999e91d] terminated with exception: [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA] Streaming stateful operator name does not match with ...

  • 1258 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @lprevost  Good day!! Please find below my analysis for your issue.  Error: [STREAM_FAILED] Query [id = 6a821fbc-490b-4ad8-891d-e4cacc2af1d6, runId = e055fede-8012-4369-861b-47183999e91d] terminated with exception: [STREAMING_STATEFUL_OPERATOR_NOT...

  • 0 kudos
Klusener
by Contributor
  • 1469 Views
  • 1 replies
  • 4 kudos

Resolved! Handling partition overwrite in Liquid Clustering

Hello,Currently we have delta tables in TBs partitioned by year, month, day. We perform dynamic partition overwrite using partitionOverwriteMode  as dynamic to handle rerun/corrections.With liquid clustering, since explicit partitions are not require...

  • 1469 Views
  • 1 replies
  • 4 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 4 kudos

Hi @Klusener Good day!!Dynamic partition overwrites only supports selective overwrites for partitioned columns, not for liquid clustering or regular columns.If you know the exact predicates, use replaceWhere. Note: This is not possible without knowin...

  • 4 kudos
Malthe
by Valued Contributor II
  • 1257 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to add primary key constraint to nullable identity column

While we can in fact define a primary key during table creation for an identity column that's nullable (i.e., not constrained using NOT NULL), it's not possible to add such a primary key constraint after the table has been created.We get an error mes...

  • 1257 Views
  • 1 replies
  • 1 kudos
Latest Reply
amuchoudhary
New Contributor III
  • 1 kudos

Creating a table with a nullable IDENTITY column and defining the primary key at creation time works.The database quietly interprets the column as NOT NULL for the purposes of the primary key, even though it's technically defined as nullable (i.e., n...

  • 1 kudos
mohdluqmancse88
by New Contributor
  • 491 Views
  • 1 replies
  • 0 kudos

Databricks on Azure

We are setting up data hubs that interacts with each other for Gen AI use cases. I want to prove that catalog sharing works across azure subscriptions if all UCs are mapped to the same metastore. Can you point me to the right documentation?

  • 491 Views
  • 1 replies
  • 0 kudos
Latest Reply
Gopichand_G
Databricks Partner
  • 0 kudos

I believe you need to follow below steps.1. Deploying a metastore in one region.2. Linking each workspace (from different Azure subscriptions but same tenant and region) to it.3. Then validating that metadata objects like catalogs, schemas, and table...

  • 0 kudos
Anand13
by New Contributor II
  • 1467 Views
  • 2 replies
  • 0 kudos

Getting concurrent issue on delta table using liquid clustering

In our project, we are testing liquid clustering using a test table called status_update, where we need to update the status for different market IDs. We are attempting to update the status_update table in parallel using the UPDATE command.ALTER TABL...

  • 1467 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anand13
New Contributor II
  • 0 kudos

@Walter_C We are using Liquid Clustering as our first strategy. Our Databricks Runtime is 13.3, and we have a table named status_update containing approximately 30 market IDs, each with a single record. In our pipeline, if any market fails, we need t...

  • 0 kudos
1 More Replies
amarnadh-gadde
by New Contributor II
  • 2228 Views
  • 6 replies
  • 0 kudos

Default catalog created wrong on my workspace

We have provisioned a new databricks account and workspace on premium plan. When built out workspace using terraform, we expected to see a default catalog matching workspace name as per this documentation. However I dont see it. All I see are the 3 c...

amarnadhgadde_0-1738260452789.png amarnadhgadde_1-1738260467287.png amarnadhgadde_2-1738260473392.png
  • 2228 Views
  • 6 replies
  • 0 kudos
Latest Reply
loic
Contributor
  • 0 kudos

Hello,Meanwhile I try to get help on Databricks default catalog behavior, I found this topic.If I can give my advice here, one reason I see for the behavior of @amarnadh-gadde is that you deployed your new workspace in a region where there is already...

  • 0 kudos
5 More Replies
seefoods
by Valued Contributor
  • 3311 Views
  • 6 replies
  • 7 kudos

Resolved! autoloader strategy write ( APPEND, MERGE, UPDATE, COMPLETE, OVERWRITE)

Hello Guys,  I want to know if operations like overwrite, merge, update in static write its the same when we using autoloader. I'm so confusing about the behavior of mode like ( complete, update and append). After that, i want to know what its the co...

  • 3311 Views
  • 6 replies
  • 7 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 7 kudos

Thanks for discussion. I have a tiny suggestion. Based on my experience working with streaming loads, I often find the checkpoint location hard enough to actually check the offset information or delete that directory for fresh load of data. Hence I h...

  • 7 kudos
5 More Replies
SatyaKoduri
by New Contributor II
  • 1669 Views
  • 1 replies
  • 1 kudos

Resolved! Yaml file to Dataframe

Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schema—allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks ...

Screenshot 2025-06-10 at 10.36.20.png
  • 1669 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @SatyaKoduri This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.Here are a few solution...

  • 1 kudos
tuckera
by New Contributor
  • 517 Views
  • 1 replies
  • 0 kudos

Governance in pipelines

How does everyone track and deploy their pipelines and generated data assets? DABs? Terraform? Manual? Something else entirely?

  • 517 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @tuckera The data engineering landscape shows a pretty diverse mix of approaches for tracking and deploying pipelines and data assets, often varying by company size, maturity, and specific needs.Infrastructure as Code (IaC) tools like Terraform an...

  • 0 kudos
Edoa
by New Contributor
  • 1544 Views
  • 1 replies
  • 0 kudos

SFTP Connection Timeout on Job Cluster but Works on Serverless Compute

Hi all,I'm experiencing inconsistent behavior when connecting to an SFTP server using Paramiko in Databricks.When I run the code on Serverless Compute, the connection to xxx.yyy.com via SFTP works correctly.When I run the same code on a Job Cluster, ...

  • 1544 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Edoa This is a common networking issue in Databricks related to the different network configurations between Serverless Compute and Job Clusters.Here are the key differences and potential solutions:Root CauseServerless Compute runs in Databricks'...

  • 0 kudos
oeztuerk82
by New Contributor II
  • 1338 Views
  • 2 replies
  • 3 kudos

Deletion of Resource Group on Azure and Impact on Databricks Workspace

Hello together,I would like to confirm the data retention and deletion behavior associated with an Azure Databricks workspace, particularly in the context of deleting an Azure resource group where a Databricks Workspace lays in.Recently, I deleted an...

  • 1338 Views
  • 2 replies
  • 3 kudos
Latest Reply
SAKBAR
New Contributor II
  • 3 kudos

Resource group once deleted cannot be recovered as like ADLS, so not possible to restore workspace or any resource under the resource group. May be Microsoft support can recover if under premium plan with them. For future perspective it is always bet...

  • 3 kudos
1 More Replies
DarioB
by New Contributor III
  • 1754 Views
  • 1 replies
  • 1 kudos

Resolved! DAB for_each_task - Passing task values

I am trying to deploy a job with a for_each_task using DAB and Terraform and I am unable to properly pass the task value into the subsequent task.These are my job tasks definition in the YAML:      tasks:        - task_key: FS_batching          job_c...

  • 1754 Views
  • 1 replies
  • 1 kudos
Latest Reply
DarioB
New Contributor III
  • 1 kudos

We have been testing and find out the issue (I just realized that my anonymization of the names removed the source of the error).We have tracked down to the inputs parameter of the for_each_task. It seems that is unable to call to task names with das...

  • 1 kudos
Labels