cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lizou1
by New Contributor III
  • 1665 Views
  • 3 replies
  • 0 kudos

serverless environment v3 JavaPackage object is not callable

run into this issue when use serverless environment v3JavaPackage object is not callable V2 works fine, any idea

  • 1665 Views
  • 3 replies
  • 0 kudos
Latest Reply
lizou1
New Contributor III
  • 0 kudos

I went to latest version 4 and this is no longer an issue. thanks

  • 0 kudos
2 More Replies
shashankB
by New Contributor II
  • 258 Views
  • 2 replies
  • 2 kudos

Resolved! Lakebridge Transpiler Fails with UnicodeDecodeError while Analyzer Works Successfully

 Hello Team,I am facing an issue with Lakebridge transpiler.The Analyzer step runs successfully and produces the expected analysis files. However, when I run the Transpiler, it fails with the following error:  ERROR [src/databricks/labs/Lakebridge.tr...

  • 258 Views
  • 2 replies
  • 2 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 2 kudos

Root CauseThe trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.Suggest...

  • 2 kudos
1 More Replies
GuruRio
by New Contributor
  • 292 Views
  • 2 replies
  • 1 kudos

Achieving batch-level overwrite for streaming SCD1 in DLT

Hi all,I am working with Databricks Delta Live Tables (DLT) and have the following scenario:Setup:Source data is delivered as weekly snapshots (not CDC).I have a bronze layer (streaming table) and a silver layer (also streaming).I am implementing SCD...

  • 292 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 1 kudos

One can achieve this with dlt.apply_changes — but you need to configure it carefully to emulate key-based batch overwrite.Step 1 — Define Bronze as Streaming Sourceimport dltfrom pyspark.sql.functions import col@Dlt.table(comment="Bronze snapshot dat...

  • 1 kudos
1 More Replies
ck7007
by Contributor
  • 467 Views
  • 5 replies
  • 3 kudos

Streaming Solution

Maintain Zonemaps with Streaming Writes Challenge: Streaming breaks zonemaps due to constant micro-batches.Solution: Incremental Updatesdef write_streaming_with_zonemap(stream_df, table_path):def update_zonemap(batch_df, batch_id):# Write databatch_d...

  • 467 Views
  • 5 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 3 kudos

@ck7007 brainstormed some solution approaches ., do you have some test data to test these hands on  Approach                            Throughput Query Speed Complexity NotesPartition-level zonemapsHighMediumLowScales with micro-batches; prune at pa...

  • 3 kudos
4 More Replies
help_needed_445
by Contributor
  • 696 Views
  • 3 replies
  • 3 kudos

Resolved! Table Fields Have a Different Value and Data Type in SQL Editor vs a SQL Notebook Cell

When I query a numeric field in the SQL Editor it returns a value of 0.02875 and the data type is decimal but when I run the same query in a SQL notebook cell it returns 0.0287500 and decimal(7,7). I'm assuming this is expected behavior but is there ...

help_needed_445_0-1756930330991.png help_needed_445_1-1756930339286.png
  • 696 Views
  • 3 replies
  • 3 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 3 kudos

Hello @help_needed_445 Good day!its very indeed interesting case study!I found below from LLM models. Yes, this difference in decimal display between the Databricks SQL Editor (which uses the Photon engine in Databricks SQL) and notebooks (which use ...

  • 3 kudos
2 More Replies
nkrom456
by New Contributor III
  • 461 Views
  • 1 replies
  • 1 kudos

Material View to External Delta Table using sink api

Hi Team,While executing the below code i am able to create the sink and my data is getting written into delta tables from materialized view. import dlt@Dlt.table(name = "employee_bronze3")def create_table():df = spark.read.table("dev.default.employee...

  • 461 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi nkrom456,How are you doing today? as per my understanding, when you use dlt.read_stream() inside the same DLT pipeline, Databricks allows it to stream from that materialized view because everything is being managed within one pipeline — it underst...

  • 1 kudos
pop_smoke
by New Contributor III
  • 3098 Views
  • 8 replies
  • 7 kudos

Resolved! write file as csv format

Is there any simple pyspark syntax to write data in csv format into a file or anywhere in free edition of databrick? in community edition , it was so easy  

  • 3098 Views
  • 8 replies
  • 7 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 7 kudos

@pop_smoke no worries! My background is with Alteryx (ETL tool). I too am learning Databricks . I look forward to seeing you in the forum ☺️. Please share any cool things you find or any projects you do .All the best,BS

  • 7 kudos
7 More Replies
pop_smoke
by New Contributor III
  • 351 Views
  • 3 replies
  • 5 kudos

Resolved! switching to Databrick from Ab Initio (an old ETL software)- NEED ADVICE

All courses in market and on youtube as per my knowledge for databrick is outdated as those courses are for community edition. there is no new course for free edition of databrick. i am a working profession and i do not get much time. do you guys kno...

  • 351 Views
  • 3 replies
  • 5 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 5 kudos

@pop_smoke keep your eyes out for this aswell:I just saw this on Linkedin:https://www.linkedin.com/posts/databricks_join-the-databricks-virtual-learning-festival-activity-7370143251149996032-PmjH?utm_source=share&utm_medium=member_desktop&rcm=ACoAAB_...

  • 5 kudos
2 More Replies
PabloCSD
by Valued Contributor II
  • 267 Views
  • 1 replies
  • 0 kudos

How to configure a Job-Compute for Unity Catalog Access? (Q/A)

If you need to access tables that are in a volume of Unity Catalog (UC), with the following configuration will work:targets: dev: mode: development default: true workspace: host: https://<workspace>.azuredatabricks.net/ run_as...

  • 267 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 0 kudos

Hello @PabloCSD Good day!Are you asking or like what are you expectations?Additions to this: You cannot create or register tables (managed or external) with locations pointing to volumes, as this is explicitly not supported—tables must use tabular st...

  • 0 kudos
Espenol1
by New Contributor II
  • 11440 Views
  • 5 replies
  • 2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

  • 11440 Views
  • 5 replies
  • 2 kudos
Latest Reply
vr
Contributor III
  • 2 kudos

As of today, you can use https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-services/service-credentials

  • 2 kudos
4 More Replies
santhiya
by New Contributor
  • 1122 Views
  • 2 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 1122 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
1 More Replies
fly_high_five
by New Contributor III
  • 707 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Hi,I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows ...

  • 707 Views
  • 5 replies
  • 1 kudos
Latest Reply
WiliamRosa
Contributor
  • 1 kudos

Hi @fly_high_five,I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connectin...

  • 1 kudos
4 More Replies
fly_high_five
by New Contributor III
  • 706 Views
  • 4 replies
  • 1 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver.  I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

  • 706 Views
  • 4 replies
  • 1 kudos
Latest Reply
fly_high_five
New Contributor III
  • 1 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

  • 1 kudos
3 More Replies
kmodelew
by New Contributor III
  • 1078 Views
  • 10 replies
  • 22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

  • 1078 Views
  • 10 replies
  • 22 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

  • 22 kudos
9 More Replies
jfvizoso
by New Contributor II
  • 12089 Views
  • 5 replies
  • 0 kudos

Can I pass parameters to a Delta Live Table pipeline at running time?

I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a parameter. I know you can have settings in the pipeline that you use in the DLT notebook, but it seems you can only assign values to them when crea...

  • 12089 Views
  • 5 replies
  • 0 kudos
Latest Reply
DeepakAI
New Contributor II
  • 0 kudos

Team - any workaround possible? I have 100+ tables which need to be ingested incrementally. I created a single DTL notebook which i am using inside a pipeline as a task, this pipeline is triggered via job on file arrival event. I want to utilize same...

  • 0 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels