Data Engineering

Forum Posts

Sorted by:

by shashankB • Databricks Partner

09-08-2025 12:00:37 AM

832 Views
2 replies
2 kudos

Resolved! Lakebridge Transpiler Fails with UnicodeDecodeError while Analyzer Works Successfully

Hello Team,I am facing an issue with Lakebridge transpiler.The Analyzer step runs successfully and produces the expected analysis files. However, when I run the Transpiler, it fails with the following error: ERROR [src/databricks/labs/Lakebridge.tr...

Data Engineering

832 Views
2 replies
2 kudos

09-08-2025 12:00:37 AM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

09-08-2025 4:29:13 AM

2 kudos

Root CauseThe trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.Suggest...

2 kudos

09-08-2025 4:29:13 AM

1 More Replies

by GuruRio • New Contributor

09-07-2025 1:40:20 PM

911 Views
2 replies
1 kudos

Achieving batch-level overwrite for streaming SCD1 in DLT

Hi all,I am working with Databricks Delta Live Tables (DLT) and have the following scenario:Setup:Source data is delivered as weekly snapshots (not CDC).I have a bronze layer (streaming table) and a silver layer (also streaming).I am implementing SCD...

Data Engineering

911 Views
2 replies
1 kudos

09-07-2025 1:40:20 PM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

09-08-2025 4:15:48 AM

1 kudos

One can achieve this with dlt.apply_changes — but you need to configure it carefully to emulate key-based batch overwrite.Step 1 — Define Bronze as Streaming Sourceimport dltfrom pyspark.sql.functions import col@Dlt.table(comment="Bronze snapshot dat...

1 kudos

09-08-2025 4:15:48 AM

1 More Replies

by ck7007 • Contributor II

09-02-2025 1:46:09 PM

1358 Views
5 replies
3 kudos

Resolved! Streaming Solution

Maintain Zonemaps with Streaming Writes Challenge: Streaming breaks zonemaps due to constant micro-batches.Solution: Incremental Updatesdef write_streaming_with_zonemap(stream_df, table_path):def update_zonemap(batch_df, batch_id):# Write databatch_d...

Data Engineering

1358 Views
5 replies
3 kudos

09-02-2025 1:46:09 PM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

09-04-2025 1:20:07 PM

3 kudos

@ck7007 brainstormed some solution approaches ., do you have some test data to test these hands on Approach Throughput Query Speed Complexity NotesPartition-level zonemapsHighMediumLowScales with micro-batches; prune at pa...

3 kudos

09-04-2025 1:20:07 PM

4 More Replies

by help_needed_445 • Contributor

09-03-2025 1:15:01 PM

1423 Views
3 replies
3 kudos

Resolved! Table Fields Have a Different Value and Data Type in SQL Editor vs a SQL Notebook Cell

When I query a numeric field in the SQL Editor it returns a value of 0.02875 and the data type is decimal but when I run the same query in a SQL notebook cell it returns 0.0287500 and decimal(7,7). I'm assuming this is expected behavior but is there ...

Data Engineering

1423 Views
3 replies
3 kudos

09-03-2025 1:15:01 PM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

09-03-2025 4:53:58 PM

3 kudos

Hello @help_needed_445 Good day!its very indeed interesting case study!I found below from LLM models. Yes, this difference in decimal display between the Databricks SQL Editor (which uses the Photon engine in Databricks SQL) and notebooks (which use ...

3 kudos

09-03-2025 4:53:58 PM

2 More Replies

by nkrom456 • New Contributor III

08-25-2025 7:17:38 AM

768 Views
1 replies
1 kudos

Material View to External Delta Table using sink api

Hi Team,While executing the below code i am able to create the sink and my data is getting written into delta tables from materialized view. import dlt@Dlt.table(name = "employee_bronze3")def create_table():df = spark.read.table("dev.default.employee...

Data Engineering

768 Views
1 replies
1 kudos

08-25-2025 7:17:38 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

09-06-2025 3:50:01 PM

1 kudos

Hi nkrom456,How are you doing today? as per my understanding, when you use dlt.read_stream() inside the same DLT pipeline, Databricks allows it to stream from that materialized view because everything is being managed within one pipeline — it underst...

1 kudos

09-06-2025 3:50:01 PM

by pop_smoke • New Contributor III

09-06-2025 5:09:34 AM

4278 Views
8 replies
7 kudos

Resolved! write file as csv format

Is there any simple pyspark syntax to write data in csv format into a file or anywhere in free edition of databrick? in community edition , it was so easy

Data Engineering

4278 Views
8 replies
7 kudos

09-06-2025 5:09:34 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

09-06-2025 9:47:28 AM

7 kudos

@pop_smoke no worries! My background is with Alteryx (ETL tool). I too am learning Databricks . I look forward to seeing you in the forum ☺️. Please share any cool things you find or any projects you do .All the best,BS

7 kudos

09-06-2025 9:47:28 AM

7 More Replies

by PabloCSD • Valued Contributor II

09-05-2025 10:46:19 AM

784 Views
1 replies
0 kudos

How to configure a Job-Compute for Unity Catalog Access? (Q/A)

If you need to access tables that are in a volume of Unity Catalog (UC), with the following configuration will work:targets: dev: mode: development default: true workspace: host: https://<workspace>.azuredatabricks.net/ run_as...

Data Engineering

784 Views
1 replies
0 kudos

09-05-2025 10:46:19 AM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

09-06-2025 3:17:23 AM

0 kudos

Hello @PabloCSD Good day!Are you asking or like what are you expectations?Additions to this: You cannot create or register tables (managed or external) with locations pointing to volumes, as this is explicitly not supported—tables must use tabular st...

0 kudos

09-06-2025 3:17:23 AM

by Espenol1 • New Contributor II

04-22-2024 4:23:36 AM

14384 Views
5 replies
2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

Data Engineering

14384 Views
5 replies
2 kudos

04-22-2024 4:23:36 AM

View Replies

Latest Reply

vr
Valued Contributor

09-05-2025 3:09:03 PM

2 kudos

As of today, you can use https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-services/service-credentials

2 kudos

09-05-2025 3:09:03 PM

4 More Replies

by santhiya • Databricks Partner

04-23-2025 4:13:24 AM

2218 Views
2 replies
0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

Data Engineering

2218 Views
2 replies
0 kudos

04-23-2025 4:13:24 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

04-23-2025 6:39:44 AM

0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

0 kudos

04-23-2025 6:39:44 AM

1 More Replies

by fly_high_five • Contributor

09-05-2025 2:48:53 AM

2011 Views
5 replies
1 kudos

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Hi,I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows ...

Data Engineering

2011 Views
5 replies
1 kudos

09-05-2025 2:48:53 AM

View Replies

Latest Reply

WiliamRosa
Databricks Partner

09-05-2025 4:23:59 AM

1 kudos

Hi @fly_high_five,I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connectin...

1 kudos

09-05-2025 4:23:59 AM

4 More Replies

by fly_high_five • Contributor

09-05-2025 2:56:26 AM

1347 Views
4 replies
2 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver. I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

Data Engineering

1347 Views
4 replies
2 kudos

09-05-2025 2:56:26 AM

View Replies

Latest Reply

fly_high_five
Contributor

09-05-2025 3:38:24 AM

2 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

2 kudos

09-05-2025 3:38:24 AM

3 More Replies

by kmodelew • New Contributor III

09-03-2025 8:04:34 AM

3569 Views
10 replies
22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

Data Engineering

3569 Views
10 replies
22 kudos

09-03-2025 8:04:34 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

09-05-2025 1:36:07 AM

22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

22 kudos

09-05-2025 1:36:07 AM

9 More Replies

by Worrachon • New Contributor

09-04-2025 3:47:17 PM

489 Views
1 replies
0 kudos

Data bricks Connot run pipeline

found that when I run the pipeline, it shows the message "'Cannot run pipeline', 'PL_TRNF_CRM_SALESFORCE_TO_BLOB', "HTTPSConnectionPool(host='management.azure.com', port=443) It doesn't happen on every instance, but I encounter this case often.

Data Engineering

489 Views
1 replies
0 kudos

09-04-2025 3:47:17 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-05-2025 12:09:10 AM

0 kudos

What exactly does the pipeline do? Fetch data from a source system? I also see Data Factory as a component?

0 kudos

09-05-2025 12:09:10 AM

by saicharandeepb • Contributor

09-04-2025 10:25:24 PM

558 Views
1 replies
1 kudos

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

Hi Community,I’m working on capturing Structured Streaming metrics and persisting them to Azure Data Lake Storage (ADLS) for monitoring and logging. To achieve this, I implemented a custom StreamingQueryListener that writes streaming progress data as...

Data Engineering

558 Views
1 replies
1 kudos

09-04-2025 10:25:24 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-04-2025 11:25:34 PM

1 kudos

Hi @saicharandeepb ,The behaviour you're experiencing can happen with coalesce. The thing is, when you use coalesce(1), you're sacrificing parallelism and everything is performed on a single executor.There's even a warning in Apache Spark OSS regardi...

1 kudos

09-04-2025 11:25:34 PM

by stucas • New Contributor II

09-03-2025 5:10:49 AM

1010 Views
2 replies
0 kudos

DLT Pipeline and Pivot tables

TLDR:Can DLT determine a dynamic schema - one which is generated from the results of a pivot?IssueI know you cant use spark `.pivot` in DLT pipeline and that if you wish to pivot data you need to do that outside of the DLT decorated functions. I have...

Data Engineering

1010 Views
2 replies
0 kudos

09-03-2025 5:10:49 AM

View Replies

Latest Reply

stucas
New Contributor II

09-04-2025 10:27:43 PM

0 kudos

Thank you for the reply - I have tried this (it was suggested in earlier solutions); but that may well be a side effect of the above function.query = f""" SELECT pivot_key, {select_clause} FROM data...

0 kudos

09-04-2025 10:27:43 PM

1 More Replies

Databricks Community

Forum Posts

Resolved! Lakebridge Transpiler Fails with UnicodeDecodeError while Analyzer Works Successfully

Achieving batch-level overwrite for streaming SCD1 in DLT

Resolved! Streaming Solution

Resolved! Table Fields Have a Different Value and Data Type in SQL Editor vs a SQL Notebook Cell

Material View to External Delta Table using sink api

Resolved! write file as csv format

How to configure a Job-Compute for Unity Catalog Access? (Q/A)

Resolved! Using managed identities to access SQL server - how?

CPU usage and idle time metrics from system tables

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Resolved! Exposing Data for Consumers in non-UC ADB

Unable to read excel file from Volume

Data bricks Connot run pipeline

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

DLT Pipeline and Pivot tables

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template