cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lizou1
by New Contributor III
  • 2166 Views
  • 3 replies
  • 0 kudos

serverless environment v3 JavaPackage object is not callable

run into this issue when use serverless environment v3JavaPackage object is not callable V2 works fine, any idea

  • 2166 Views
  • 3 replies
  • 0 kudos
Latest Reply
lizou1
New Contributor III
  • 0 kudos

I went to latest version 4 and this is no longer an issue. thanks

  • 0 kudos
2 More Replies
shashankB
by Databricks Partner
  • 1277 Views
  • 2 replies
  • 2 kudos

Resolved! Lakebridge Transpiler Fails with UnicodeDecodeError while Analyzer Works Successfully

 Hello Team,I am facing an issue with Lakebridge transpiler.The Analyzer step runs successfully and produces the expected analysis files. However, when I run the Transpiler, it fails with the following error:  ERROR [src/databricks/labs/Lakebridge.tr...

  • 1277 Views
  • 2 replies
  • 2 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 2 kudos

Root CauseThe trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.Suggest...

  • 2 kudos
1 More Replies
GuruRio
by New Contributor
  • 1235 Views
  • 2 replies
  • 1 kudos

Achieving batch-level overwrite for streaming SCD1 in DLT

Hi all,I am working with Databricks Delta Live Tables (DLT) and have the following scenario:Setup:Source data is delivered as weekly snapshots (not CDC).I have a bronze layer (streaming table) and a silver layer (also streaming).I am implementing SCD...

  • 1235 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

One can achieve this with dlt.apply_changes — but you need to configure it carefully to emulate key-based batch overwrite.Step 1 — Define Bronze as Streaming Sourceimport dltfrom pyspark.sql.functions import col@Dlt.table(comment="Bronze snapshot dat...

  • 1 kudos
1 More Replies
ck7007
by Contributor II
  • 2049 Views
  • 5 replies
  • 3 kudos

Resolved! Streaming Solution

Maintain Zonemaps with Streaming Writes Challenge: Streaming breaks zonemaps due to constant micro-batches.Solution: Incremental Updatesdef write_streaming_with_zonemap(stream_df, table_path):def update_zonemap(batch_df, batch_id):# Write databatch_d...

  • 2049 Views
  • 5 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 3 kudos

@ck7007 brainstormed some solution approaches ., do you have some test data to test these hands on  Approach                            Throughput Query Speed Complexity NotesPartition-level zonemapsHighMediumLowScales with micro-batches; prune at pa...

  • 3 kudos
4 More Replies
help_needed_445
by Contributor
  • 1855 Views
  • 3 replies
  • 3 kudos

Resolved! Table Fields Have a Different Value and Data Type in SQL Editor vs a SQL Notebook Cell

When I query a numeric field in the SQL Editor it returns a value of 0.02875 and the data type is decimal but when I run the same query in a SQL notebook cell it returns 0.0287500 and decimal(7,7). I'm assuming this is expected behavior but is there ...

help_needed_445_0-1756930330991.png help_needed_445_1-1756930339286.png
  • 1855 Views
  • 3 replies
  • 3 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 3 kudos

Hello @help_needed_445 Good day!its very indeed interesting case study!I found below from LLM models. Yes, this difference in decimal display between the Databricks SQL Editor (which uses the Photon engine in Databricks SQL) and notebooks (which use ...

  • 3 kudos
2 More Replies
nkrom456
by New Contributor III
  • 1030 Views
  • 1 replies
  • 1 kudos

Material View to External Delta Table using sink api

Hi Team,While executing the below code i am able to create the sink and my data is getting written into delta tables from materialized view. import dlt@Dlt.table(name = "employee_bronze3")def create_table():df = spark.read.table("dev.default.employee...

  • 1030 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor II
  • 1 kudos

Hi nkrom456,How are you doing today? as per my understanding, when you use dlt.read_stream() inside the same DLT pipeline, Databricks allows it to stream from that materialized view because everything is being managed within one pipeline — it underst...

  • 1 kudos
pop_smoke
by New Contributor III
  • 5603 Views
  • 8 replies
  • 7 kudos

Resolved! write file as csv format

Is there any simple pyspark syntax to write data in csv format into a file or anywhere in free edition of databrick? in community edition , it was so easy  

  • 5603 Views
  • 8 replies
  • 7 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 7 kudos

@pop_smoke no worries! My background is with Alteryx (ETL tool). I too am learning Databricks . I look forward to seeing you in the forum ☺️. Please share any cool things you find or any projects you do .All the best,BS

  • 7 kudos
7 More Replies
PabloCSD
by Valued Contributor II
  • 1162 Views
  • 1 replies
  • 0 kudos

How to configure a Job-Compute for Unity Catalog Access? (Q/A)

If you need to access tables that are in a volume of Unity Catalog (UC), with the following configuration will work:targets: dev: mode: development default: true workspace: host: https://<workspace>.azuredatabricks.net/ run_as...

  • 1162 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @PabloCSD Good day!Are you asking or like what are you expectations?Additions to this: You cannot create or register tables (managed or external) with locations pointing to volumes, as this is explicitly not supported—tables must use tabular st...

  • 0 kudos
Espenol1
by New Contributor II
  • 15812 Views
  • 5 replies
  • 2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

  • 15812 Views
  • 5 replies
  • 2 kudos
Latest Reply
vr
Valued Contributor
  • 2 kudos

As of today, you can use https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-services/service-credentials

  • 2 kudos
4 More Replies
santhiya
by Databricks Partner
  • 2706 Views
  • 2 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 2706 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
1 More Replies
fly_high_five
by Contributor
  • 2542 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Hi,I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows ...

  • 2542 Views
  • 5 replies
  • 1 kudos
Latest Reply
WiliamRosa
Databricks Partner
  • 1 kudos

Hi @fly_high_five,I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connectin...

  • 1 kudos
4 More Replies
fly_high_five
by Contributor
  • 1865 Views
  • 4 replies
  • 2 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver.  I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

  • 1865 Views
  • 4 replies
  • 2 kudos
Latest Reply
fly_high_five
Contributor
  • 2 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

  • 2 kudos
3 More Replies
kmodelew
by New Contributor III
  • 5574 Views
  • 10 replies
  • 22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

  • 5574 Views
  • 10 replies
  • 22 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

  • 22 kudos
9 More Replies
Worrachon
by New Contributor
  • 661 Views
  • 1 replies
  • 0 kudos

Data bricks Connot run pipeline

 found that when I run the pipeline, it shows the message "'Cannot run pipeline', 'PL_TRNF_CRM_SALESFORCE_TO_BLOB', "HTTPSConnectionPool(host='management.azure.com', port=443) It doesn't happen on every instance, but I encounter this case often. 

Worrachon_3-1757025996750.png
  • 661 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

What exactly does the pipeline do?  Fetch data from a source system? I also see Data Factory as a component?

  • 0 kudos
saicharandeepb
by Contributor
  • 692 Views
  • 1 replies
  • 1 kudos

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

Hi Community,I’m working on capturing Structured Streaming metrics and persisting them to Azure Data Lake Storage (ADLS) for monitoring and logging. To achieve this, I implemented a custom StreamingQueryListener that writes streaming progress data as...

image (1).png
  • 692 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @saicharandeepb ,The behaviour you're experiencing can happen with coalesce. The thing is, when you use coalesce(1), you're sacrificing parallelism and everything is performed on a single executor.There's even a warning in Apache Spark OSS regardi...

  • 1 kudos
Labels