cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

prafulja
by New Contributor II
  • 547 Views
  • 3 replies
  • 1 kudos

Resolved! Found issue with DLT for each batch Sink.

We are creating a Bronze table on top of ADLS data using Auto Loader with DLT. After that, we create the Silver table using a for-each-batch sink. Finally, we create the Gold table through a DLT materialized view.However, when creating the Gold table...

  • 547 Views
  • 3 replies
  • 1 kudos
Latest Reply
prafulja
New Contributor II
  • 1 kudos

Thank you for sharing the detailed explanation. I was following the same approach, but the challenge was that with foreachBatch, SDP wasn’t able to reliably track whether the table had been created or not. When I tried Option 3 (without using the LIV...

  • 1 kudos
2 More Replies
dashawn
by New Contributor
  • 6660 Views
  • 5 replies
  • 1 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering
Delta Live Tables
  • 6660 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kirankumarbs
Contributor III
  • 1 kudos

@dashawn DLT treats the whole pipeline as one unit, so if any table definition throws an error during the planning phase (not just execution), the entire update fails. An empty S3 directory causing a schema inference failure is exactly the kind of th...

  • 1 kudos
4 More Replies
learti
by New Contributor III
  • 558 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks cluster cannot reach SQL Server over VPC peering despite EC2 connectivity - AWS

We are experiencing a networking issue where a Databricks cluster cannot connect to a SQL Server instance hosted in another VPC, even though connectivity from a regular EC2 instance works.Two EC2 instances deployed in the Databricks subnets (NatSubne...

  • 558 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @learti, This is a very common scenario -- EC2 instances in the Databricks subnets can reach your SQL Server, but the Databricks cluster itself cannot. The good news is that VPC peering is fully supported with Databricks-managed VPCs, so this is N...

  • 0 kudos
1 More Replies
Sumeet2
by New Contributor II
  • 423 Views
  • 1 replies
  • 1 kudos

Resolved! Connect to a delta table to django web app

  Hi !I am building a django web app, its in local for now. I am using databricks -sql-connector to run a simple query 'select * from catalog.schema.table_name' and display it on an html page. I keep getting an error that the view or table is not fou...

error.png query-run-proof.png
  • 423 Views
  • 1 replies
  • 1 kudos
Latest Reply
balajij8
Contributor III
  • 1 kudos

You can use cursor.execute("SELECT * FROM scidstools.assetmanager.trucks") instead of cursor.execute('SELECT * FROM `scidstools.assetmanager.trucks`')info here

  • 1 kudos
Malthe
by Valued Contributor II
  • 1505 Views
  • 12 replies
  • 1 kudos

Resolved! Aggregated task time not accounted for in executions

The following serverless query has an aggregated task time of 1.94h, but each of the two executions runs in an aggregated time of just a few minutes.How is one supposed to make sense of that?

Screenshot 2026-03-06 at 09.52.01.png
  • 1505 Views
  • 12 replies
  • 1 kudos
Latest Reply
Malthe
Valued Contributor II
  • 1 kudos

@Ashwin_DSA by the way, "Open in Spark UI" is available only for SQL Warehouse; it's not available for serverless notebooks or jobs.

  • 1 kudos
11 More Replies
loic
by Contributor
  • 1133 Views
  • 4 replies
  • 3 kudos

Resolved! Transfer ownership of a Delta Share

Hello,I would like to clarify a point about Delta Share ownership.Indeed, there is something that is not clear in Databricks documentation.On one side, in the delta sharing page, it is written that "metastore admin" role is needed in order to change ...

  • 1133 Views
  • 4 replies
  • 3 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 3 kudos

Hi @loic, You can transfer ownership of a Delta Share using the ALTER SHARE ... OWNER TO SQL command. This is available in Databricks SQL and Databricks Runtime 11.3 LTS and above. USING SQL The syntax is straightforward: ALTER SHARE my_share OWNER T...

  • 3 kudos
3 More Replies
sebih
by New Contributor II
  • 700 Views
  • 3 replies
  • 1 kudos

Unable to apply liquid clustering to a materialized view

Hi everyone,I am trying to create a materialized view with liquid clustering using the code below. However, I realized that the query performance is slower than that of a streaming table with the same data, liquid clustering, and structure. It appear...

sebih_0-1768820672926.png
  • 700 Views
  • 3 replies
  • 1 kudos
Latest Reply
sebih
New Contributor II
  • 1 kudos

Hi,- The table size is about 2 TB.- I had already set the liquid clustering keys before creating the materialized view. There were no issues with automatic liquid clustering.- The issue with the liquid clustering keys not appearing in the metadata ha...

  • 1 kudos
2 More Replies
kenmyers-8451
by Contributor II
  • 862 Views
  • 3 replies
  • 1 kudos

Resolved! mode: development not working as expected

Hey I'm trying to add mode: development to my "Development" target (which is default) but it does not seem to be working as I expected. Here is what my targets file looks like:I'm deploying with this command: databricks272 bundle deploy -p dev3 -t De...

Screenshot 2026-03-05 at 9.51.17 AM.png Screenshot 2026-03-05 at 9.53.51 AM.png
  • 862 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @kenmyers-8451, Glad you tracked this down. This is a common gotcha with Databricks Asset Bundles (DABs) when splitting configuration across multiple files: if the file containing your target definition (with mode: development) is not listed in th...

  • 1 kudos
2 More Replies
abhijit007
by Databricks Partner
  • 912 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks App Issue– “socket hang up / ECONNRESET” when API call runs > 30 seconds

Problem Statement:We are running a Data App on Databricks that uses Next.js (frontend) and FastAPI (backend). The backend calls a Databricks Agent (AgentBricks) via a serving endpoint, which typically needs ~1 minute to return a response. However, an...

  • 912 Views
  • 2 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @abhijit007, Your debugging was thorough and you correctly isolated the issue: the timeout is happening upstream of your application code. Databricks Apps run behind a managed ingress/request router that enforces request-level timeouts (typically ...

  • 2 kudos
1 More Replies
neerajaN
by New Contributor II
  • 715 Views
  • 4 replies
  • 2 kudos

Resolved! count function

Hi, as per spark internals, once count function executed in worker nodes , one of the worker node collect all the count of records and do summation ?or count of records from all worker nodes passed to driver node. and summation done driver node side....

  • 715 Views
  • 4 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @neerajaN, You are correct that the count() operation follows a two-phase aggregation pattern in Spark. Here is how it works in detail: PHASE 1: PARTIAL AGGREGATION (EXECUTORS) Each executor computes a local partial count for the partitions assign...

  • 2 kudos
3 More Replies
Malthe
by Valued Contributor II
  • 666 Views
  • 3 replies
  • 0 kudos

Resolved! Python segmentation fault in serverless job

We're getting a Python segmentation fault in a serverless job that uses Delta Table merge inside a foreachBatch step in structured streaming (trigger once)./databricks/python/lib/python3.12/site-packages/pyspark/sql/connect/streaming/query.py:479: Us...

Screenshot 2026-03-05 at 11.01.39.png
  • 666 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Malthe, Since you have confirmed this is vanilla PySpark with no external libraries on serverless runtime environment version 5, this narrows things down considerably. Here are some additional observations and recommendations beyond what Louis sh...

  • 0 kudos
2 More Replies
NW1000
by New Contributor III
  • 833 Views
  • 3 replies
  • 1 kudos

Resolved! Unable to access files using a classic cluster

I used the same code with the classic cluster (RunTime 17.3LTS ML, with spark config: "spark.databricks.workspace.fileSystem.enabled true"), not able to access files in workspace with the following python code: import os# Check if source exists and w...

  • 833 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @NW1000, This behavior comes down to how workspace file access and identity work differently between serverless compute and classic clusters. SERVERLESS COMPUTE Serverless interactive compute runs under your own identity. It inherits your workspac...

  • 1 kudos
2 More Replies
Seunghyun
by Contributor
  • 1112 Views
  • 3 replies
  • 2 kudos

Resolved! Conditional Logic in Databricks Asset Bundles using Go Templates

I am defining a job using Databricks Asset Bundles (DABs) as follows:YAML resources: jobs: job_name: ... schedule: {{ if eq ${var.env} "prd" }} pause_status: "UNPAUSED" {{ else }} pause_status: "PAUSE...

  • 1112 Views
  • 3 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @Seunghyun, Go template syntax ({{if}}, {{eq}}, etc.) is only supported in bundle project templates, which are the .tmpl files used during "databricks bundle init" to scaffold new projects. It is not supported inside your regular databricks.yml co...

  • 2 kudos
2 More Replies
Seunghyun
by Contributor
  • 1008 Views
  • 2 replies
  • 2 kudos

Resolved! Managing dashboard refresh schedules in DABs

I am currently using Databricks Asset Bundles (DABs) to deploy and manage dashboard resources. While I can manually add a schedule to a dashboard via the Databricks console, I would like to reflect this same configuration in the dashboard YAML file. ...

  • 1008 Views
  • 2 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @Seunghyun, You are correct that the dashboard resource definition in Databricks Asset Bundles does not currently include schedule-related properties. The dashboard resource supports properties like display_name, file_path, warehouse_id, embed_cre...

  • 2 kudos
1 More Replies
FAHADURREHMAN
by New Contributor III
  • 995 Views
  • 3 replies
  • 2 kudos

Optimizing Large Materialized View to expedite query execution

Hi All, I have a DLT Pipeline setup which reading Parquets from S3 Bucket and create a materialized view. Created view is quite big and contains Billion of records and contain around few TB of data. Predictive Optimization is already enabled. automat...

  • 995 Views
  • 3 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @FAHADURREHMAN, There are several layers to optimizing query performance on a multi-TB materialized view, and the other replies here cover the ingestion/refresh side well. Let me add some guidance on the query-side tuning and help you decide betwe...

  • 2 kudos
2 More Replies
Labels