cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shashankB
by Databricks Partner
  • 832 Views
  • 2 replies
  • 2 kudos

Resolved! Lakebridge Transpiler Fails with UnicodeDecodeError while Analyzer Works Successfully

 Hello Team,I am facing an issue with Lakebridge transpiler.The Analyzer step runs successfully and produces the expected analysis files. However, when I run the Transpiler, it fails with the following error:  ERROR [src/databricks/labs/Lakebridge.tr...

  • 832 Views
  • 2 replies
  • 2 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 2 kudos

Root CauseThe trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.Suggest...

  • 2 kudos
1 More Replies
GuruRio
by New Contributor
  • 911 Views
  • 2 replies
  • 1 kudos

Achieving batch-level overwrite for streaming SCD1 in DLT

Hi all,I am working with Databricks Delta Live Tables (DLT) and have the following scenario:Setup:Source data is delivered as weekly snapshots (not CDC).I have a bronze layer (streaming table) and a silver layer (also streaming).I am implementing SCD...

  • 911 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

One can achieve this with dlt.apply_changes — but you need to configure it carefully to emulate key-based batch overwrite.Step 1 — Define Bronze as Streaming Sourceimport dltfrom pyspark.sql.functions import col@Dlt.table(comment="Bronze snapshot dat...

  • 1 kudos
1 More Replies
ck7007
by Contributor II
  • 1358 Views
  • 5 replies
  • 3 kudos

Resolved! Streaming Solution

Maintain Zonemaps with Streaming Writes Challenge: Streaming breaks zonemaps due to constant micro-batches.Solution: Incremental Updatesdef write_streaming_with_zonemap(stream_df, table_path):def update_zonemap(batch_df, batch_id):# Write databatch_d...

  • 1358 Views
  • 5 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 3 kudos

@ck7007 brainstormed some solution approaches ., do you have some test data to test these hands on  Approach                            Throughput Query Speed Complexity NotesPartition-level zonemapsHighMediumLowScales with micro-batches; prune at pa...

  • 3 kudos
4 More Replies
help_needed_445
by Contributor
  • 1423 Views
  • 3 replies
  • 3 kudos

Resolved! Table Fields Have a Different Value and Data Type in SQL Editor vs a SQL Notebook Cell

When I query a numeric field in the SQL Editor it returns a value of 0.02875 and the data type is decimal but when I run the same query in a SQL notebook cell it returns 0.0287500 and decimal(7,7). I'm assuming this is expected behavior but is there ...

help_needed_445_0-1756930330991.png help_needed_445_1-1756930339286.png
  • 1423 Views
  • 3 replies
  • 3 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 3 kudos

Hello @help_needed_445 Good day!its very indeed interesting case study!I found below from LLM models. Yes, this difference in decimal display between the Databricks SQL Editor (which uses the Photon engine in Databricks SQL) and notebooks (which use ...

  • 3 kudos
2 More Replies
nkrom456
by New Contributor III
  • 768 Views
  • 1 replies
  • 1 kudos

Material View to External Delta Table using sink api

Hi Team,While executing the below code i am able to create the sink and my data is getting written into delta tables from materialized view. import dlt@Dlt.table(name = "employee_bronze3")def create_table():df = spark.read.table("dev.default.employee...

  • 768 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi nkrom456,How are you doing today? as per my understanding, when you use dlt.read_stream() inside the same DLT pipeline, Databricks allows it to stream from that materialized view because everything is being managed within one pipeline — it underst...

  • 1 kudos
pop_smoke
by New Contributor III
  • 4278 Views
  • 8 replies
  • 7 kudos

Resolved! write file as csv format

Is there any simple pyspark syntax to write data in csv format into a file or anywhere in free edition of databrick? in community edition , it was so easy  

  • 4278 Views
  • 8 replies
  • 7 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 7 kudos

@pop_smoke no worries! My background is with Alteryx (ETL tool). I too am learning Databricks . I look forward to seeing you in the forum ☺️. Please share any cool things you find or any projects you do .All the best,BS

  • 7 kudos
7 More Replies
PabloCSD
by Valued Contributor II
  • 784 Views
  • 1 replies
  • 0 kudos

How to configure a Job-Compute for Unity Catalog Access? (Q/A)

If you need to access tables that are in a volume of Unity Catalog (UC), with the following configuration will work:targets: dev: mode: development default: true workspace: host: https://<workspace>.azuredatabricks.net/ run_as...

  • 784 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @PabloCSD Good day!Are you asking or like what are you expectations?Additions to this: You cannot create or register tables (managed or external) with locations pointing to volumes, as this is explicitly not supported—tables must use tabular st...

  • 0 kudos
Espenol1
by New Contributor II
  • 14384 Views
  • 5 replies
  • 2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

  • 14384 Views
  • 5 replies
  • 2 kudos
Latest Reply
vr
Valued Contributor
  • 2 kudos

As of today, you can use https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-services/service-credentials

  • 2 kudos
4 More Replies
santhiya
by Databricks Partner
  • 2218 Views
  • 2 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 2218 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
1 More Replies
fly_high_five
by Contributor
  • 2011 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Hi,I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows ...

  • 2011 Views
  • 5 replies
  • 1 kudos
Latest Reply
WiliamRosa
Databricks Partner
  • 1 kudos

Hi @fly_high_five,I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connectin...

  • 1 kudos
4 More Replies
fly_high_five
by Contributor
  • 1347 Views
  • 4 replies
  • 2 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver.  I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

  • 1347 Views
  • 4 replies
  • 2 kudos
Latest Reply
fly_high_five
Contributor
  • 2 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

  • 2 kudos
3 More Replies
kmodelew
by New Contributor III
  • 3569 Views
  • 10 replies
  • 22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

  • 3569 Views
  • 10 replies
  • 22 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

  • 22 kudos
9 More Replies
Worrachon
by New Contributor
  • 489 Views
  • 1 replies
  • 0 kudos

Data bricks Connot run pipeline

 found that when I run the pipeline, it shows the message "'Cannot run pipeline', 'PL_TRNF_CRM_SALESFORCE_TO_BLOB', "HTTPSConnectionPool(host='management.azure.com', port=443) It doesn't happen on every instance, but I encounter this case often. 

Worrachon_3-1757025996750.png
  • 489 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

What exactly does the pipeline do?  Fetch data from a source system? I also see Data Factory as a component?

  • 0 kudos
saicharandeepb
by Contributor
  • 558 Views
  • 1 replies
  • 1 kudos

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

Hi Community,I’m working on capturing Structured Streaming metrics and persisting them to Azure Data Lake Storage (ADLS) for monitoring and logging. To achieve this, I implemented a custom StreamingQueryListener that writes streaming progress data as...

image (1).png
  • 558 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @saicharandeepb ,The behaviour you're experiencing can happen with coalesce. The thing is, when you use coalesce(1), you're sacrificing parallelism and everything is performed on a single executor.There's even a warning in Apache Spark OSS regardi...

  • 1 kudos
stucas
by New Contributor II
  • 1010 Views
  • 2 replies
  • 0 kudos

DLT Pipeline and Pivot tables

TLDR:Can DLT determine a dynamic schema - one which is generated from the results of a pivot?IssueI know you cant use spark `.pivot` in DLT pipeline and that if you wish to pivot data you need to do that outside of the DLT decorated functions. I have...

  • 1010 Views
  • 2 replies
  • 0 kudos
Latest Reply
stucas
New Contributor II
  • 0 kudos

Thank you for the reply - I have tried this (it was suggested in earlier solutions); but that may well be a side effect of the above function.query = f"""            SELECT pivot_key,                {select_clause}            FROM                data...

  • 0 kudos
1 More Replies
Labels