Data Engineering

Forum Posts

Sorted by:

by jeremy98 • Honored Contributor

12-04-2024 11:41:15 AM

2882 Views
6 replies
0 kudos

Move Databricks service to another resource group

Hello,Is it possible to move in another resource group the databricks service without any problem?I have a resource group where there are two workspaces the prod and staging environment, I created another resource group to maintain only the databrick...

Data Engineering

2882 Views
6 replies
0 kudos

12-04-2024 11:41:15 AM

View Replies

Latest Reply

nickv
New Contributor II

02-18-2025 2:52:13 AM

0 kudos

I'm running into the same problem, what's the procedure to create a feature request for this? It seems to me that when DB is running in Azure that I should be able to move it to a different resource group.

0 kudos

02-18-2025 2:52:13 AM

5 More Replies

by FilipezAR • New Contributor

05-13-2024 2:25:20 PM

15701 Views
3 replies
1 kudos

Failed to create new KafkaAdminClient

I want to create connections to kafka with spark.readStream using the following parameters: kafkaParams = { "kafka.sasl.jaas.config": f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{kafkaUsername}" password="{kafkaPa...

Data Engineering

15701 Views
3 replies
1 kudos

05-13-2024 2:25:20 PM

View Replies

Latest Reply

Marcin
Databricks Employee

02-18-2025 2:02:52 AM

1 kudos

If you are using Confluent with Schema Registry you can use the below code. No additional libraries need to be installed. From Databricks Runtime 16.0 it support schema references and recursive references: from pyspark.sql.functions import col, lit f...

1 kudos

02-18-2025 2:02:52 AM

2 More Replies

by mattmunz • New Contributor III

07-29-2022 7:15:25 PM

6878 Views
2 replies
4 kudos

JDBC Error: Error occured while deserializing arrow data

I am getting the following error in my Java application.java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500618) Error occured while deserializing arrow data: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not availableI beli...

Data Engineering

6878 Views
2 replies
4 kudos

07-29-2022 7:15:25 PM

View Replies

Latest Reply

cvcore
New Contributor II

02-18-2025 12:27:45 AM

4 kudos

For anyone encountering this issue in 2025, I was able to solve it by using the --add-opens=jdk.unsupported/sun.misc=ALL-UNNAMEDoption in combination with the latest jdbc driver (v2.7.1). I was using the driver in dbeaver, but I assume the issue coul...

4 kudos

02-18-2025 12:27:45 AM

1 More Replies

by Shivap • New Contributor III

02-17-2025 4:47:51 PM

1433 Views
2 replies
0 kudos

Resolved! Writing back from notebook to blob storage as single file with UC configured databricks

I want to write a file from notebook to blob storage. we have configured unity catalog. When it writes it creates the folder name as the file name that I have provided and inside that it writes multiple files as show below. Can someone suggest me on ...

Data Engineering

1433 Views
2 replies
0 kudos

02-17-2025 4:47:51 PM

View Replies

Latest Reply

Stefan-Koch
Databricks Partner

02-17-2025 9:55:32 PM

0 kudos

Hi ShivapIf you want to save a dataframe as a single file, you could consider to convert the pyspark dataframe to a pandas dataframe and then save it as file. path_single_file = '/Volumes/demo/raw/test/single' # create sample dataframe df = spark.cr...

0 kudos

02-17-2025 9:55:32 PM

1 More Replies

by lrodcon • New Contributor III

12-29-2022 11:27:17 AM

14853 Views
6 replies
4 kudos

Read external iceberg table in a spark dataframe within databricks

I am trying to read an external iceberg database from s3 location using the follwing commanddf_source = (spark.read.format("iceberg") .load(source_s3_path) .drop(*source_drop_columns) .filter(f"{date_column}<='{date_filter}'") )B...

Data Engineering

14853 Views
6 replies
4 kudos

12-29-2022 11:27:17 AM

View Replies

Latest Reply

dynofu
New Contributor II

06-10-2023 11:00:48 AM

4 kudos

https://issues.apache.org/jira/browse/SPARK-41344

4 kudos

06-10-2023 11:00:48 AM

5 More Replies

by kasuskasus1 • Databricks Partner

02-12-2025 9:25:55 PM

876 Views
1 replies
0 kudos

Resolved! How to use GLOW in Databricks Premium on AWS?

Hi!Have connected workspace to AWS, but when I execute in a new notebook: %python %pip install glow.py import glow from pyspark.sql import SparkSession # Create a Spark session spark = (SparkSession.builder .appName("Genomics Analysis") ...

Data Engineering

876 Views
1 replies
0 kudos

02-12-2025 9:25:55 PM

View Replies

Latest Reply

kasuskasus1
Databricks Partner

02-17-2025 3:17:27 PM

0 kudos

Solved this with the help of colleagues at last. First of all, it won't work with Serverless mode, so a cluster is required. Once the cluster is created in Compute section, on Library tab add those 2 libraries:Then running:import glow from pyspark.sq...

0 kudos

02-17-2025 3:17:27 PM

by lozik • Databricks Partner

07-22-2024 12:21:19 PM

2033 Views
2 replies
0 kudos

Python callback functions fail to trigger

How can I get sys.exceptionhook and atexit module to trigger a callback function on exit of a python notebook? These fail to work when an unhandled exception is encountered (exceptionhook), or the program exits (atexit).

Data Engineering

2033 Views
2 replies
0 kudos

07-22-2024 12:21:19 PM

View Replies

Latest Reply

Pieter
New Contributor II

02-17-2025 5:22:28 AM

0 kudos

Hey Lozik,Ran into this myself as well. The reason this doesn't work is because Databricks is using Ipython under the hood.The following codesnippet creates an exception hook for all exceptions (using the general Exception), it's also possible to spe...

0 kudos

02-17-2025 5:22:28 AM

1 More Replies

by mjedy78 • New Contributor II

02-17-2025 3:11:12 AM

1147 Views
1 replies
0 kudos

Databricks read CDF by partitions for better performance?

I’m working with a large dataframe in Databricks, processing it in a streaming-batch fashion (I’m reading as a stream, but using .trigger(availableNow=True) for batch-like processing).I’m fetching around 40GB of CDF updates daily and performing some ...

Data Engineering

1147 Views
1 replies
0 kudos

02-17-2025 3:11:12 AM

View Replies

Latest Reply

cherry54wilder
New Contributor II

02-17-2025 4:02:02 AM

0 kudos

You can indeed leverage your partitioned column to read and process Change Data Feed (CDF) changes in partitions. This approach can help you manage the processing load and improve performance. Here's a general outline of how you can achieve this:1. *...

0 kudos

02-17-2025 4:02:02 AM

by pra18 • New Contributor II

02-14-2025 5:51:28 AM

1922 Views
2 replies
0 kudos

Handling Binary Files Larger than 2GB in Apache Spark

I'm trying to process large binary files (>2GB) in Apache Spark, but I'm running into the following error:File format is : .mf4 (Measurement Data Format) org.apache.spark.SparkException: The length of ... is 14749763360, which exceeds the max length ...

Data Engineering

1922 Views
2 replies
0 kudos

02-14-2025 5:51:28 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

02-16-2025 1:54:17 PM

0 kudos

Hi @pra18, You can split and load the binary files using split command like this. ret = os.system("split -b 4020000 -a 4 -d large_data.dat large_data.dat_split_")

0 kudos

02-16-2025 1:54:17 PM

1 More Replies

by shan-databricks • Databricks Partner

02-13-2025 4:18:59 AM

1558 Views
2 replies
0 kudos

Databricks Workflow Orchestration

I have 50 tables and will increase gradually, so I want to create a single workflow to orchestrate the job and run it table-wise. Is there an option to do this in Databricks workflow?

Data Engineering

1558 Views
2 replies
0 kudos

02-13-2025 4:18:59 AM

View Replies

Latest Reply

Edthehead
Contributor III

02-14-2025 12:06:08 AM

0 kudos

Breakup these 50 tables logically or functionally and place them in their own workflows. A good strategy would be to group tables that are dependent in the same workflow. Then use a master workflow to trigger each child workflow. So it will be like a...

0 kudos

02-14-2025 12:06:08 AM

1 More Replies

by subhas_1729 • New Contributor II

02-16-2025 4:46:29 PM

1825 Views
1 replies
0 kudos

Dashboard

Hi I want to design a dashboard that will show some variables of Spark-UI. Is it possible to access Spark-UI variables from my spark program.

Data Engineering

1825 Views
1 replies
0 kudos

02-16-2025 4:46:29 PM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

02-16-2025 6:11:29 PM

0 kudos

Hi @subhas_1729, You can achieve this by leveraging Spark's monitoring and instrumentation APIs. Spark provides metrics that can be accessed through the SparkListener interface as well as the REST API. The SparkListener interface allows you to receiv...

0 kudos

02-16-2025 6:11:29 PM

by dbhavesh • Databricks Partner

02-10-2025 9:49:56 PM

1699 Views
3 replies
1 kudos

How to Apply row_num in DLT

Hi all,how to use row_num in DLT or What is the alternative for row_num function in DLT.We are looking for same functionality which row num is doing. Thanks in advance.

Data Engineering

1699 Views
3 replies
1 kudos

02-10-2025 9:49:56 PM

View Replies

Latest Reply

Takuya-Omi
Valued Contributor III

02-15-2025 9:57:19 AM

1 kudos

@dbhavesh I apologize for the lack of explanation.The ROW_NUMBER function requires ordering over the entire dataset, making it a non-time-based window function. When applied to streaming data, it results in the "NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREA...

1 kudos

02-15-2025 9:57:19 AM

2 More Replies

by sachin_kanchan • New Contributor III

02-07-2025 5:24:44 AM

2146 Views
6 replies
0 kudos

Unable to log in into Community Edition

So I just registered for the Databricks Community Edition. And received an email for verification.When I click the link, I'm redirected to this website (image attached) where I am asked to input email. And when I do that, it sends me a verification c...

Data Engineering

2146 Views
6 replies
0 kudos

02-07-2025 5:24:44 AM

View Replies

Latest Reply

sachin_kanchan
New Contributor III

02-15-2025 3:33:03 AM

0 kudos

What a disappointment this has been

0 kudos

02-15-2025 3:33:03 AM

5 More Replies

by davben93 • New Contributor II

04-25-2023 5:04:53 PM

8273 Views
4 replies
1 kudos

Does Spark Connect is available in JAVA?

Data Engineering

8273 Views
4 replies
1 kudos

04-25-2023 5:04:53 PM

View Replies

Latest Reply

Davben1993
New Contributor II

02-15-2025 12:04:42 AM

1 kudos

Are there any updates??

1 kudos

02-15-2025 12:04:42 AM

3 More Replies

by prasidataengine • New Contributor II

02-14-2025 8:03:33 AM

1772 Views
2 replies
0 kudos

Issue when connecting with Databricks cluster 15.4 without unity catalog using databricks connect

Hi,I have a shared cluster created on databricks which uses 15.4 runtime.I dont want to enable the unity catalog for this cluster.Previously I used python 3.9.13 version to connect to 11.3 cluster using databricks connect 11.3Now my company has restr...

Data Engineering

Databricks

databricks-connect

1772 Views
2 replies
0 kudos

02-14-2025 8:03:33 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

02-14-2025 8:11:32 AM

0 kudos

Hi @prasidataengine, For DBR runtime 13.3 LTS and above you must have Unity Catalog enabled to be able to use databricks-connect. A Databricks account and workspace that have Unity Catalog enabled. See Set up and manage Unity Catalog and Enable a wo...

0 kudos

02-14-2025 8:11:32 AM

1 More Replies

Databricks Community

Forum Posts

Move Databricks service to another resource group

Failed to create new KafkaAdminClient

JDBC Error: Error occured while deserializing arrow data

Resolved! Writing back from notebook to blob storage as single file with UC configured databricks

Read external iceberg table in a spark dataframe within databricks

Resolved! How to use GLOW in Databricks Premium on AWS?

Python callback functions fail to trigger

Databricks read CDF by partitions for better performance?

Handling Binary Files Larger than 2GB in Apache Spark

Databricks Workflow Orchestration

Dashboard

How to Apply row_num in DLT

Unable to log in into Community Edition

Does Spark Connect is available in JAVA?

Issue when connecting with Databricks cluster 15.4 without unity catalog using databricks connect

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template