cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dominos
by New Contributor II
  • 987 Views
  • 4 replies
  • 0 kudos

Does DBR 14.3 not support Describe history command?

Hello, We have recently updated DBR version from 9.1 LTS to 14.3 LTS and observed that DESCRIBE HISTORY is not supported in 14.3 LTS. Could you please suggest any alternative to be used for table history? 

  • 987 Views
  • 4 replies
  • 0 kudos
Latest Reply
holly
Databricks Employee
  • 0 kudos

Hi, I'm still not able to recreate this issue with Standard_DS3_v2.  I'm not sure if this is relevant, but do you also have this issue on an old High Concurrency cluster with custom access mode for the Standard_DS3_v2 cluster too? 

  • 0 kudos
3 More Replies
Faizan_khan8171
by New Contributor
  • 573 Views
  • 1 replies
  • 1 kudos

External Location Naming Issue & Impact of Renaming in Unity Catalog

Hey,I created an external location in my test environment using a mount point . Now, when I try to create the same external location in prod, it doesn’t allow me to use the same name. Is there any specific reason for this restriction in Unity Catalog...

  • 573 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hello @Faizan_khan8171, Thanks for your question. In Unity Catalog, external location names must be unique within the metastore. This restriction prevents naming conflicts and ensures that every external location is easily identifiable and manageable...

  • 1 kudos
jeremy98
by Honored Contributor
  • 2394 Views
  • 6 replies
  • 0 kudos

Move Databricks service to another resource group

Hello,Is it possible to move in another resource group the databricks service without any problem?I have a resource group where there are two workspaces the prod and staging environment, I created another resource group to maintain only the databrick...

  • 2394 Views
  • 6 replies
  • 0 kudos
Latest Reply
nickv
New Contributor II
  • 0 kudos

I'm running into the same problem, what's the procedure to create a feature request for this? It seems to me that when DB is running in Azure that I should be able to move it to a different resource group. 

  • 0 kudos
5 More Replies
FilipezAR
by New Contributor
  • 13979 Views
  • 3 replies
  • 1 kudos

Failed to create new KafkaAdminClient

I want to create connections to kafka with spark.readStream using the following parameters: kafkaParams = { "kafka.sasl.jaas.config": f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{kafkaUsername}" password="{kafkaPa...

  • 13979 Views
  • 3 replies
  • 1 kudos
Latest Reply
Marcin
Databricks Employee
  • 1 kudos

If you are using Confluent with Schema Registry you can use the below code. No additional libraries need to be installed. From Databricks Runtime 16.0 it support schema references and recursive references: from pyspark.sql.functions import col, lit f...

  • 1 kudos
2 More Replies
mattmunz
by New Contributor III
  • 6493 Views
  • 2 replies
  • 4 kudos

JDBC Error: Error occured while deserializing arrow data

I am getting the following error in my Java application.java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500618) Error occured while deserializing arrow data: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not availableI beli...

  • 6493 Views
  • 2 replies
  • 4 kudos
Latest Reply
cvcore
New Contributor II
  • 4 kudos

For anyone encountering this issue in 2025, I was able to solve it by using the --add-opens=jdk.unsupported/sun.misc=ALL-UNNAMEDoption in combination with the latest jdbc driver (v2.7.1). I was using the driver in dbeaver, but I assume the issue coul...

  • 4 kudos
1 More Replies
Shivap
by New Contributor III
  • 1119 Views
  • 2 replies
  • 0 kudos

Resolved! Writing back from notebook to blob storage as single file with UC configured databricks

I want to write a file from notebook to blob storage. we have configured unity catalog. When it writes it creates the folder name as the file name that I have provided and inside that it writes multiple files as show below. Can someone suggest me on ...

  • 1119 Views
  • 2 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

Hi ShivapIf you want to save a dataframe as a single file, you could consider to convert the pyspark dataframe to a pandas dataframe and then save it as file. path_single_file = '/Volumes/demo/raw/test/single' # create sample dataframe df = spark.cr...

  • 0 kudos
1 More Replies
lrodcon
by New Contributor III
  • 13930 Views
  • 6 replies
  • 4 kudos

Read external iceberg table in a spark dataframe within databricks

I am trying to read an external iceberg database from s3 location using the follwing commanddf_source = (spark.read.format("iceberg")   .load(source_s3_path)   .drop(*source_drop_columns)   .filter(f"{date_column}<='{date_filter}'")   )B...

  • 13930 Views
  • 6 replies
  • 4 kudos
Latest Reply
dynofu
New Contributor II
  • 4 kudos

https://issues.apache.org/jira/browse/SPARK-41344

  • 4 kudos
5 More Replies
kasuskasus1
by New Contributor III
  • 719 Views
  • 1 replies
  • 0 kudos

Resolved! How to use GLOW in Databricks Premium on AWS?

Hi!Have connected workspace to AWS, but when I execute in a new notebook:  %python %pip install glow.py import glow from pyspark.sql import SparkSession # Create a Spark session spark = (SparkSession.builder .appName("Genomics Analysis") ...

  • 719 Views
  • 1 replies
  • 0 kudos
Latest Reply
kasuskasus1
New Contributor III
  • 0 kudos

Solved this with the help of colleagues at last. First of all, it won't work with Serverless mode, so a cluster is required. Once the cluster is created in Compute section, on Library tab add those 2 libraries:Then running:import glow from pyspark.sq...

  • 0 kudos
lozik
by New Contributor II
  • 1644 Views
  • 2 replies
  • 0 kudos

Python callback functions fail to trigger

How can I get sys.exceptionhook and atexit module to trigger a callback function on exit of a python notebook? These fail to work when an unhandled exception is encountered (exceptionhook), or the program exits (atexit). 

  • 1644 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pieter
New Contributor II
  • 0 kudos

Hey Lozik,Ran into this myself as well. The reason this doesn't work is because Databricks is using Ipython under the hood.The following codesnippet creates an exception hook for all exceptions (using the general Exception), it's also possible to spe...

  • 0 kudos
1 More Replies
mjedy78
by New Contributor II
  • 947 Views
  • 1 replies
  • 0 kudos

Databricks read CDF by partitions for better performance?

I’m working with a large dataframe in Databricks, processing it in a streaming-batch fashion (I’m reading as a stream, but using .trigger(availableNow=True) for batch-like processing).I’m fetching around 40GB of CDF updates daily and performing some ...

  • 947 Views
  • 1 replies
  • 0 kudos
Latest Reply
cherry54wilder
New Contributor II
  • 0 kudos

You can indeed leverage your partitioned column to read and process Change Data Feed (CDF) changes in partitions. This approach can help you manage the processing load and improve performance. Here's a general outline of how you can achieve this:1. *...

  • 0 kudos
pra18
by New Contributor II
  • 1478 Views
  • 2 replies
  • 0 kudos

Handling Binary Files Larger than 2GB in Apache Spark

I'm trying to process large binary files (>2GB) in Apache Spark, but I'm running into the following error:File format is : .mf4 (Measurement Data Format) org.apache.spark.SparkException: The length of ... is 14749763360, which exceeds the max length ...

  • 1478 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @pra18, You can split and load the binary files using split command like this. ret = os.system("split -b 4020000 -a 4 -d large_data.dat large_data.dat_split_")

  • 0 kudos
1 More Replies
kivaniutenko
by New Contributor
  • 450 Views
  • 0 replies
  • 0 kudos

HTML Formatting Issue in Databricks Alerts

Hello everyone,I have recently encountered an issue with HTML formatting in custom templates for Databricks Alerts. Previously, the formatting worked correctly, but now the alerts display raw HTML instead of properly rendered content.For example, an ...

  • 450 Views
  • 0 replies
  • 0 kudos
shan-databricks
by New Contributor III
  • 1226 Views
  • 2 replies
  • 0 kudos

Databricks Workflow Orchestration

I have 50 tables and will increase gradually, so I want to create a single workflow to orchestrate the job and run it table-wise. Is there an option to do this in Databricks workflow?

  • 1226 Views
  • 2 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

Breakup these 50 tables logically or functionally and place them in their own workflows. A good strategy would be to group tables that are dependent in the same workflow. Then use a master workflow to trigger each child workflow. So it will be like a...

  • 0 kudos
1 More Replies
subhas_1729
by New Contributor II
  • 1068 Views
  • 1 replies
  • 0 kudos

Dashboard

Hi       I want to design a dashboard that will show some variables of Spark-UI. Is it possible to access Spark-UI variables from my spark program. 

  • 1068 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @subhas_1729, You can achieve this by leveraging Spark's monitoring and instrumentation APIs. Spark provides metrics that can be accessed through the SparkListener interface as well as the REST API. The SparkListener interface allows you to receiv...

  • 0 kudos
dbhavesh
by New Contributor II
  • 1350 Views
  • 3 replies
  • 1 kudos

How to Apply row_num in DLT

Hi all,how to use row_num in DLT or What is the alternative for row_num function in DLT.We are looking for same functionality which row num is doing. Thanks in advance.

  • 1350 Views
  • 3 replies
  • 1 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 1 kudos

@dbhavesh I apologize for the lack of explanation.The ROW_NUMBER function requires ordering over the entire dataset, making it a non-time-based window function. When applied to streaming data, it results in the "NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREA...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels