cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TomFielding
by Databricks Partner
  • 5395 Views
  • 2 replies
  • 3 kudos

Resolved! HELP!

Hey Databricks Community, This may be a silly question but, are we able to share Databricks related job posts on here? 

  • 5395 Views
  • 2 replies
  • 3 kudos
Latest Reply
Advika
Community Manager
  • 3 kudos

Hello @TomFielding! If your question is about posting job openings in the Community, then the answer is No. The Community is intended for discussions and knowledge sharing around Databricks products.

  • 3 kudos
1 More Replies
stefan_erste
by New Contributor III
  • 3869 Views
  • 11 replies
  • 3 kudos

Resolved! Programmatically setting TAGs on VIEWs

Hi all,In order to achieve data stability in our Workspace, our IT team has given us access to the data through VIEWs on top of an ingestion schema.Now I want to provide metadata to VIEWs in form of TAGs (IT does not want to cover this used case). Th...

  • 3869 Views
  • 11 replies
  • 3 kudos
Latest Reply
stefan_erste
New Contributor III
  • 3 kudos

Hi @szymon_dybczak and @WiliamRosa,I have used backticks from the very start (you'll see it if you re-check my original post).It is definitely a cluster issue as I am able to assign tags using serverless cluster. The reason I was using a dedicated on...

  • 3 kudos
10 More Replies
felix4572
by New Contributor III
  • 2173 Views
  • 9 replies
  • 6 kudos

Resolved! transformWithStateInPandas throws "Spark connect directory is not ready" error

Hello,we employ arbitrary stateful aggregations in our data processing streams on Azure Databricks, and would like to migrate from applyInPandasWithState to transformWithStateInPandas. We employ the Python API throughout our solution, and some of our...

felix4572_0-1756710186921.png
Data Engineering
stateful processing
structured streaming
transformWithStateInPandas
  • 2173 Views
  • 9 replies
  • 6 kudos
Latest Reply
Advika
Community Manager
  • 6 kudos

Update: This is working fine with earlier DBR versions, but the issue seems to occur specifically with DBR 17.1.I’ve flagged this behaviour with the internal team for further investigation.

  • 6 kudos
8 More Replies
Ramana
by Valued Contributor II
  • 6434 Views
  • 7 replies
  • 2 kudos

Databricks Key Vault Secret - Is it available in Databricks on AWS?

@Hubert-Dudek, I see your post regarding Key Value Secret handling via UI for Databricks on Azure.Is this feature available for Databricks on AWS as well?#Secrets #Scopes #DatabricksOnAWSThanksRamana

  • 6434 Views
  • 7 replies
  • 2 kudos
Latest Reply
ceceliac
New Contributor III
  • 2 kudos

Hi, do you have any update on this topic? We are looking for the same thing.  We are using Databricks (UC) on AWS and have a developer group that wants to implement 90-day rotation for an AWS IAM secret using Secrets Manager.  I do not see anything i...

  • 2 kudos
6 More Replies
lizou1
by New Contributor III
  • 2201 Views
  • 3 replies
  • 0 kudos

serverless environment v3 JavaPackage object is not callable

run into this issue when use serverless environment v3JavaPackage object is not callable V2 works fine, any idea

  • 2201 Views
  • 3 replies
  • 0 kudos
Latest Reply
lizou1
New Contributor III
  • 0 kudos

I went to latest version 4 and this is no longer an issue. thanks

  • 0 kudos
2 More Replies
shashankB
by Databricks Partner
  • 1345 Views
  • 2 replies
  • 2 kudos

Resolved! Lakebridge Transpiler Fails with UnicodeDecodeError while Analyzer Works Successfully

 Hello Team,I am facing an issue with Lakebridge transpiler.The Analyzer step runs successfully and produces the expected analysis files. However, when I run the Transpiler, it fails with the following error:  ERROR [src/databricks/labs/Lakebridge.tr...

  • 1345 Views
  • 2 replies
  • 2 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 2 kudos

Root CauseThe trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.Suggest...

  • 2 kudos
1 More Replies
GuruRio
by New Contributor
  • 1263 Views
  • 2 replies
  • 1 kudos

Achieving batch-level overwrite for streaming SCD1 in DLT

Hi all,I am working with Databricks Delta Live Tables (DLT) and have the following scenario:Setup:Source data is delivered as weekly snapshots (not CDC).I have a bronze layer (streaming table) and a silver layer (also streaming).I am implementing SCD...

  • 1263 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

One can achieve this with dlt.apply_changes — but you need to configure it carefully to emulate key-based batch overwrite.Step 1 — Define Bronze as Streaming Sourceimport dltfrom pyspark.sql.functions import col@Dlt.table(comment="Bronze snapshot dat...

  • 1 kudos
1 More Replies
ck7007
by Contributor II
  • 2158 Views
  • 5 replies
  • 3 kudos

Resolved! Streaming Solution

Maintain Zonemaps with Streaming Writes Challenge: Streaming breaks zonemaps due to constant micro-batches.Solution: Incremental Updatesdef write_streaming_with_zonemap(stream_df, table_path):def update_zonemap(batch_df, batch_id):# Write databatch_d...

  • 2158 Views
  • 5 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 3 kudos

@ck7007 brainstormed some solution approaches ., do you have some test data to test these hands on  Approach                            Throughput Query Speed Complexity NotesPartition-level zonemapsHighMediumLowScales with micro-batches; prune at pa...

  • 3 kudos
4 More Replies
help_needed_445
by Contributor
  • 2052 Views
  • 3 replies
  • 3 kudos

Resolved! Table Fields Have a Different Value and Data Type in SQL Editor vs a SQL Notebook Cell

When I query a numeric field in the SQL Editor it returns a value of 0.02875 and the data type is decimal but when I run the same query in a SQL notebook cell it returns 0.0287500 and decimal(7,7). I'm assuming this is expected behavior but is there ...

help_needed_445_0-1756930330991.png help_needed_445_1-1756930339286.png
  • 2052 Views
  • 3 replies
  • 3 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 3 kudos

Hello @help_needed_445 Good day!its very indeed interesting case study!I found below from LLM models. Yes, this difference in decimal display between the Databricks SQL Editor (which uses the Photon engine in Databricks SQL) and notebooks (which use ...

  • 3 kudos
2 More Replies
nkrom456
by New Contributor III
  • 1066 Views
  • 1 replies
  • 1 kudos

Material View to External Delta Table using sink api

Hi Team,While executing the below code i am able to create the sink and my data is getting written into delta tables from materialized view. import dlt@Dlt.table(name = "employee_bronze3")def create_table():df = spark.read.table("dev.default.employee...

  • 1066 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor II
  • 1 kudos

Hi nkrom456,How are you doing today? as per my understanding, when you use dlt.read_stream() inside the same DLT pipeline, Databricks allows it to stream from that materialized view because everything is being managed within one pipeline — it underst...

  • 1 kudos
pop_smoke
by New Contributor III
  • 5853 Views
  • 8 replies
  • 7 kudos

Resolved! write file as csv format

Is there any simple pyspark syntax to write data in csv format into a file or anywhere in free edition of databrick? in community edition , it was so easy  

  • 5853 Views
  • 8 replies
  • 7 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 7 kudos

@pop_smoke no worries! My background is with Alteryx (ETL tool). I too am learning Databricks . I look forward to seeing you in the forum ☺️. Please share any cool things you find or any projects you do .All the best,BS

  • 7 kudos
7 More Replies
PabloCSD
by Valued Contributor II
  • 1210 Views
  • 1 replies
  • 0 kudos

How to configure a Job-Compute for Unity Catalog Access? (Q/A)

If you need to access tables that are in a volume of Unity Catalog (UC), with the following configuration will work:targets: dev: mode: development default: true workspace: host: https://<workspace>.azuredatabricks.net/ run_as...

  • 1210 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @PabloCSD Good day!Are you asking or like what are you expectations?Additions to this: You cannot create or register tables (managed or external) with locations pointing to volumes, as this is explicitly not supported—tables must use tabular st...

  • 0 kudos
Espenol1
by New Contributor II
  • 16163 Views
  • 5 replies
  • 2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

  • 16163 Views
  • 5 replies
  • 2 kudos
Latest Reply
vr
Valued Contributor
  • 2 kudos

As of today, you can use https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-services/service-credentials

  • 2 kudos
4 More Replies
santhiya
by Databricks Partner
  • 2811 Views
  • 2 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 2811 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
1 More Replies
fly_high_five
by Contributor
  • 2671 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Hi,I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows ...

  • 2671 Views
  • 5 replies
  • 1 kudos
Latest Reply
WiliamRosa
Databricks Partner
  • 1 kudos

Hi @fly_high_five,I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connectin...

  • 1 kudos
4 More Replies
Labels