cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rakeshsekar2025
by New Contributor III
  • 879 Views
  • 2 replies
  • 0 kudos

Not able to read sample data in databricks in shared cluster but using single cluster im able to

Im not able to view sample data using share clusterError getting sample datasocket closedBut when I use the single cluster mode Im able to read the data   

rakeshsekar2025_0-1747037485441.png rakeshsekar2025_1-1747037563326.png
  • 879 Views
  • 2 replies
  • 0 kudos
Latest Reply
rakeshsekar2025
New Contributor III
  • 0 kudos

I've enabled the outbound traffic on port 8443 but still its not working please help me out here

  • 0 kudos
1 More Replies
pjruhnke
by New Contributor
  • 1216 Views
  • 2 replies
  • 0 kudos

Newest version of dbx-workspace always returns NoneType

I just updated the `databricks-sdk` library to the newest version on PyPi, and for some reason, I am almost always getting this error:File "/home/site/wwwroot/.python_packages/lib/site-packages/databricks/sdk/credentials_provider.py", line 283, in to...

  • 1216 Views
  • 2 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 0 kudos

It seems your issue is in getting the AAD token. You are using an SPN to authenticate. You can try to update the azure packages too.azure-identity>=1.21.0azure-core>=1.32.0azure-mgmt-core>=1.6.0databricks-sdk>=0.57.0

  • 0 kudos
1 More Replies
laus
by New Contributor III
  • 42846 Views
  • 4 replies
  • 2 kudos

Resolved! How to solve Py4JJavaError: An error occurred while calling o5082.csv. : org.apache.spark.SparkException: Job aborted. when writing to csv

Hi ,I get the error: Py4JJavaError: An error occurred while calling o5082.csv.: org.apache.spark.SparkException: Job aborted. when writing to csv.Screenshot below with detail error.Any idea how to solve it?Thanks!

Screenshot 2022-03-31 at 17.33.26
  • 42846 Views
  • 4 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Please try output.coalesce(1).write.option("header","true").format("csv").save("path")It seems to be same to https://community.databricks.com/s/topic/0TO3f000000CjVqGAK/py4jjavaerror

  • 2 kudos
3 More Replies
JMartins777
by New Contributor
  • 2584 Views
  • 2 replies
  • 3 kudos

Resolved! Azure Databricks Power Platform Connector - Doubts

Hello, Regarding the recently released azure databricks connector, i want to connect it to a Power App but i have 2 main questions which i need to know1 - If the databricks URL is in a private network, how does it work and how can i achieve this conn...

  • 2584 Views
  • 2 replies
  • 3 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 3 kudos

1. Private Network ConnectivityFor Databricks workspaces in private networks, you have a couple of options:Option A: On-premises Data GatewayInstall the Microsoft On-premises Data Gateway in your private networkThe gateway acts as a bridge between Po...

  • 3 kudos
1 More Replies
DanielaHello
by New Contributor
  • 1589 Views
  • 1 replies
  • 0 kudos

Free edition and serverless edition are not loading and are really slow

 Good morning,I am trying since last week to access two workspaces that I have (one in the free edition), and the other one in the paid serverless edition.Both of the workspaces are not loading, if they load they are very very slow, and I cannot see ...

  • 1589 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @DanielaHello ,That's weird. According to status page, there is no outage in any region currently. Could you try to use different browser? Or try to log in incognito mode.For instance, Databricks Free Editon is currently available only in one regi...

  • 0 kudos
devyani_k
by New Contributor
  • 2553 Views
  • 1 replies
  • 1 kudos

Resolved! Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage

Hi,I'm trying to extract usage cost per user (run_by) for workloads that utilize all-purpose clusters and SQL warehouses. I’ve been exploring the system.billing.usage table but noticed some challenges:1. For records related to all-purpose clusters an...

  • 2553 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interact...

  • 1 kudos
varni
by New Contributor III
  • 2009 Views
  • 2 replies
  • 6 kudos

Resolved! Unity Catalog blocks DML (UPDATE, DELETE) on static Delta tables — unable to use spark.sql

Hello,We’ve started migrating from Azure Databricks (Hive Metastore) to AWS Databricks with Unity Catalog. Our entire codebase was deliberately designed around spark.sql('...') using DML operations (UPDATE, DELETE, MERGE) for two reasons:In many case...

  • 2009 Views
  • 2 replies
  • 6 kudos
Latest Reply
varni
New Contributor III
  • 6 kudos

[RESOLVED] The issue was caused by the source tables being in Parquet format. After rewriting them as Delta tables, everything worked fine — including DML operations like UPDATE via DataFrame logic. Thanks!

  • 6 kudos
1 More Replies
alsetr
by New Contributor III
  • 2403 Views
  • 4 replies
  • 0 kudos

Executor OOM Error with AQE enabled

We have Databricks Spark Job. After migration from Databricks Runtime 10.4 to 15.4 one of our Spark jobs which uses broadcast hint started to fail with error:```ERROR Executor: Exception in task 2.0 in stage 371.0 (TID 16912)org.apache.spark.memory.S...

  • 2403 Views
  • 4 replies
  • 0 kudos
Latest Reply
alsetr
New Contributor III
  • 0 kudos

I found similar issuehttps://kb.databricks.com/python/job-fails-with-not-enough-memory-to-build-the-hash-map-errorLooks like the reason of error is a bug in new Databricks feature which is called executor-side broadcast (ebj, executor broadcast join)...

  • 0 kudos
3 More Replies
wilsmith
by New Contributor
  • 693 Views
  • 1 replies
  • 0 kudos

COPY INTO maintaining row order

I have a CSV file in S3 and loading the rows in the order they appear in the file is necessary for parsing it out later. When using COPY INTO will it maintain that order so the bronze layer is in exactly the same order as the source file?

  • 693 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @wilsmith COPY INTO does not guarantee the order of rows because it processes files in parallel using Spark’s distributed architecture. This means that the ingestion engine reads different parts of the file simultaneously, potentially splitting a...

  • 0 kudos
briancuster63
by New Contributor II
  • 2888 Views
  • 4 replies
  • 0 kudos

Asset Bundle .py files being converted to notebooks when deployed to Databricks

Hi everyone, I'm finding a particularly frustrating issue whenever I try to run some python code in an asset bundle on my workspace. The code and notebooks deploy fine but once deployed, the code files get converted to notebooks and I'm no longer abl...

  • 2888 Views
  • 4 replies
  • 0 kudos
Latest Reply
olivier-soucy
Contributor
  • 0 kudos

I came here looking for a solution to the opposite problem: I was hoping my .py files to be available as a notebook (without adding extra headers). Unfortunately, this does not seem to be possible with DABs.@facebiranhari if you have not solved your ...

  • 0 kudos
3 More Replies
Sen
by New Contributor
  • 17097 Views
  • 10 replies
  • 2 kudos

Resolved! Performance enhancement while writing dataframes into Parquet tables

Hi,I am trying to write the contents of a dataframe into a parquet table using the command below.df.write.mode("overwrite").format("parquet").saveAsTable("sample_parquet_table")The dataframe contains an extract from one of our source systems, which h...

  • 17097 Views
  • 10 replies
  • 2 kudos
Latest Reply
BobClarke
New Contributor II
  • 2 kudos

I am Bob Clarke marketing manager of virtual assistants Pakistan and I help companies hire amazon virtual assistants who manage product listings order processing and inventory updates. Our trained staff improves efficiency and boosts sales. We suppor...

  • 2 kudos
9 More Replies
shubham7
by New Contributor II
  • 926 Views
  • 2 replies
  • 0 kudos

reading XML file of mutiple row Tags

I have multiple xml files in a folder. i am reading into dataframe in a databricks cell. It has one rootTag and multiple rowTags. can i read into single spark dataframe (pyspark) for all the rowTags. Any reference for this or approach would greatly a...

  • 926 Views
  • 2 replies
  • 0 kudos
Latest Reply
shubham7
New Contributor II
  • 0 kudos

you are correct, but i have N number of different rowTags. how to read in a dataframe.

  • 0 kudos
1 More Replies
jordan72
by New Contributor III
  • 1964 Views
  • 8 replies
  • 2 kudos

Resolved! German Umlauts wrong via JDBC

Hi,I have the issue that German Umlauts are not getting retrieved correctly via the JDBC driver.It shows M�nchen instead of München.I load the driver in my java app via:<groupId>com.databricks</groupId><artifactId>databricks-jdbc</artifactId><version...

  • 1964 Views
  • 8 replies
  • 2 kudos
Latest Reply
jordan72
New Contributor III
  • 2 kudos

ok, so it seems that it has something to do with the newly introduced native.encoding system property.So In Netbeans you have to provide -Dstdout.encoding=utf-8 to the vm if you are using JDK21.

  • 2 kudos
7 More Replies
JCooke
by New Contributor II
  • 1904 Views
  • 3 replies
  • 1 kudos

Deploying Metastore with Terraform

my goal is to be able to enable unity catalog on a clean Azure deployment of databricks with absolutely no history of databricks. I know I need to create a metastore for the Azure Region. And to do this I know I need Account Admin from the accounts p...

  • 1904 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @JCooke ,The first assignment of the Databricks Account Admin role is a bit of a special case. There is always a manual step required to assign the first Account Admin in a new Databricks account on Azure. This step cannot be fully automated via T...

  • 1 kudos
2 More Replies
pooja_bhumandla
by New Contributor III
  • 1485 Views
  • 2 replies
  • 1 kudos

Why is Merge with Deletion Vectors Slower Than Full File Rewrite on the Same Table?

I've run two MERGE INTO operations on the same Delta table—one with Deletion Vectors enabled (Case 1), and one without (Case 2).In Case 1 (with Deletion Vectors):  executionTimeMs: 106,708  materializeSourceTimeMs: 24,344 numTargetRowsUpdated: 22  nu...

  • 1485 Views
  • 2 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

Hi Poojalets understand DV first -  This avoid rewriting entire files by marking rows as deleted/updated via a bitmap (the deletion vector), which should, in theory, be faster for small updates.but DV introduces new overhead:1) Writing and updating t...

  • 1 kudos
1 More Replies
Labels