cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ila-de
by New Contributor III
  • 4436 Views
  • 7 replies
  • 1 kudos

Resolved! databricks workspace import_dir not working without any failure message

Morning everyone!I`m trying to copy from the repo into the databricks workspace all the notebooks. I`m using the command: databricks workspace import_dir . /Shared/Notebooks, it will just print all the info regarding the Workspace API.If I launch dat...

  • 4436 Views
  • 7 replies
  • 1 kudos
Latest Reply
ila-de
New Contributor III
  • 1 kudos

Hi all,I`ve disinstalled and installed again databricks-cli and now worked.Is not a real solution but still it worked after one week...

  • 1 kudos
6 More Replies
p_romm
by New Contributor III
  • 670 Views
  • 1 replies
  • 0 kudos

Autoloader is not able to infer schema from json

Hi, I have json files and it contains json array and only one object (payload below), I have set in autoloader inferSchema to true, however autoloader throws:"Failed to infer schema for format json from existing files ..."I have also check option to ...

  • 670 Views
  • 1 replies
  • 0 kudos
Latest Reply
p_romm
New Contributor III
  • 0 kudos

Yep, my mistake, json file was corrupted. 

  • 0 kudos
cmunteanu
by Contributor
  • 1267 Views
  • 2 replies
  • 0 kudos

External connection to Azure ADLS Gen2 storage

Hello, I have a problem trying to make an external connection to a blob storage configured as ADLS Gen2 with hierarchical namespace (HNS) enabled. I have setuup the storage account with a container wirh HNS enabled as in the image attached:Next I hav...

cmunteanu_0-1739883785698.png cmunteanu_1-1739883952145.png cmunteanu_2-1739885124675.png
  • 1267 Views
  • 2 replies
  • 0 kudos
Latest Reply
hao_hu
New Contributor II
  • 0 kudos

Hi, would it work if you try to remove "landing" at the end? Seems the error is complaining that the external location should be a directory.   

  • 0 kudos
1 More Replies
Splush_
by New Contributor III
  • 1418 Views
  • 1 replies
  • 0 kudos

Resolved! Hostname not resolving using Spark JDBC

Hey guys,Ive ran into a weird error this morning. Last week I was testing a new Oracle Connector and it ran through smooth the whole last week!This morning at 7 it ran again and following it was showing a "SQLRecoverableException: IO Error: Unknown h...

  • 1418 Views
  • 1 replies
  • 0 kudos
Latest Reply
Splush_
New Contributor III
  • 0 kudos

I have even cloned the cluster and it worked on the new one. But after turning the cluster off over night, it started working again the next morning. This is really weird.

  • 0 kudos
Dominos
by New Contributor II
  • 1065 Views
  • 4 replies
  • 0 kudos

Does DBR 14.3 not support Describe history command?

Hello, We have recently updated DBR version from 9.1 LTS to 14.3 LTS and observed that DESCRIBE HISTORY is not supported in 14.3 LTS. Could you please suggest any alternative to be used for table history? 

  • 1065 Views
  • 4 replies
  • 0 kudos
Latest Reply
holly
Databricks Employee
  • 0 kudos

Hi, I'm still not able to recreate this issue with Standard_DS3_v2.  I'm not sure if this is relevant, but do you also have this issue on an old High Concurrency cluster with custom access mode for the Standard_DS3_v2 cluster too? 

  • 0 kudos
3 More Replies
Faizan_khan8171
by New Contributor
  • 627 Views
  • 1 replies
  • 1 kudos

External Location Naming Issue & Impact of Renaming in Unity Catalog

Hey,I created an external location in my test environment using a mount point . Now, when I try to create the same external location in prod, it doesn’t allow me to use the same name. Is there any specific reason for this restriction in Unity Catalog...

  • 627 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hello @Faizan_khan8171, Thanks for your question. In Unity Catalog, external location names must be unique within the metastore. This restriction prevents naming conflicts and ensures that every external location is easily identifiable and manageable...

  • 1 kudos
jeremy98
by Honored Contributor
  • 2569 Views
  • 6 replies
  • 0 kudos

Move Databricks service to another resource group

Hello,Is it possible to move in another resource group the databricks service without any problem?I have a resource group where there are two workspaces the prod and staging environment, I created another resource group to maintain only the databrick...

  • 2569 Views
  • 6 replies
  • 0 kudos
Latest Reply
nickv
New Contributor II
  • 0 kudos

I'm running into the same problem, what's the procedure to create a feature request for this? It seems to me that when DB is running in Azure that I should be able to move it to a different resource group. 

  • 0 kudos
5 More Replies
FilipezAR
by New Contributor
  • 14620 Views
  • 3 replies
  • 1 kudos

Failed to create new KafkaAdminClient

I want to create connections to kafka with spark.readStream using the following parameters: kafkaParams = { "kafka.sasl.jaas.config": f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{kafkaUsername}" password="{kafkaPa...

  • 14620 Views
  • 3 replies
  • 1 kudos
Latest Reply
Marcin
Databricks Employee
  • 1 kudos

If you are using Confluent with Schema Registry you can use the below code. No additional libraries need to be installed. From Databricks Runtime 16.0 it support schema references and recursive references: from pyspark.sql.functions import col, lit f...

  • 1 kudos
2 More Replies
mattmunz
by New Contributor III
  • 6632 Views
  • 2 replies
  • 4 kudos

JDBC Error: Error occured while deserializing arrow data

I am getting the following error in my Java application.java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500618) Error occured while deserializing arrow data: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not availableI beli...

  • 6632 Views
  • 2 replies
  • 4 kudos
Latest Reply
cvcore
New Contributor II
  • 4 kudos

For anyone encountering this issue in 2025, I was able to solve it by using the --add-opens=jdk.unsupported/sun.misc=ALL-UNNAMEDoption in combination with the latest jdbc driver (v2.7.1). I was using the driver in dbeaver, but I assume the issue coul...

  • 4 kudos
1 More Replies
Shivap
by New Contributor III
  • 1259 Views
  • 2 replies
  • 0 kudos

Resolved! Writing back from notebook to blob storage as single file with UC configured databricks

I want to write a file from notebook to blob storage. we have configured unity catalog. When it writes it creates the folder name as the file name that I have provided and inside that it writes multiple files as show below. Can someone suggest me on ...

  • 1259 Views
  • 2 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

Hi ShivapIf you want to save a dataframe as a single file, you could consider to convert the pyspark dataframe to a pandas dataframe and then save it as file. path_single_file = '/Volumes/demo/raw/test/single' # create sample dataframe df = spark.cr...

  • 0 kudos
1 More Replies
lrodcon
by New Contributor III
  • 14245 Views
  • 6 replies
  • 4 kudos

Read external iceberg table in a spark dataframe within databricks

I am trying to read an external iceberg database from s3 location using the follwing commanddf_source = (spark.read.format("iceberg")   .load(source_s3_path)   .drop(*source_drop_columns)   .filter(f"{date_column}<='{date_filter}'")   )B...

  • 14245 Views
  • 6 replies
  • 4 kudos
Latest Reply
dynofu
New Contributor II
  • 4 kudos

https://issues.apache.org/jira/browse/SPARK-41344

  • 4 kudos
5 More Replies
kasuskasus1
by New Contributor III
  • 781 Views
  • 1 replies
  • 0 kudos

Resolved! How to use GLOW in Databricks Premium on AWS?

Hi!Have connected workspace to AWS, but when I execute in a new notebook:  %python %pip install glow.py import glow from pyspark.sql import SparkSession # Create a Spark session spark = (SparkSession.builder .appName("Genomics Analysis") ...

  • 781 Views
  • 1 replies
  • 0 kudos
Latest Reply
kasuskasus1
New Contributor III
  • 0 kudos

Solved this with the help of colleagues at last. First of all, it won't work with Serverless mode, so a cluster is required. Once the cluster is created in Compute section, on Library tab add those 2 libraries:Then running:import glow from pyspark.sq...

  • 0 kudos
lozik
by New Contributor II
  • 1768 Views
  • 2 replies
  • 0 kudos

Python callback functions fail to trigger

How can I get sys.exceptionhook and atexit module to trigger a callback function on exit of a python notebook? These fail to work when an unhandled exception is encountered (exceptionhook), or the program exits (atexit). 

  • 1768 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pieter
New Contributor II
  • 0 kudos

Hey Lozik,Ran into this myself as well. The reason this doesn't work is because Databricks is using Ipython under the hood.The following codesnippet creates an exception hook for all exceptions (using the general Exception), it's also possible to spe...

  • 0 kudos
1 More Replies
mjedy78
by New Contributor II
  • 1018 Views
  • 1 replies
  • 0 kudos

Databricks read CDF by partitions for better performance?

I’m working with a large dataframe in Databricks, processing it in a streaming-batch fashion (I’m reading as a stream, but using .trigger(availableNow=True) for batch-like processing).I’m fetching around 40GB of CDF updates daily and performing some ...

  • 1018 Views
  • 1 replies
  • 0 kudos
Latest Reply
cherry54wilder
New Contributor II
  • 0 kudos

You can indeed leverage your partitioned column to read and process Change Data Feed (CDF) changes in partitions. This approach can help you manage the processing load and improve performance. Here's a general outline of how you can achieve this:1. *...

  • 0 kudos
pra18
by New Contributor II
  • 1605 Views
  • 2 replies
  • 0 kudos

Handling Binary Files Larger than 2GB in Apache Spark

I'm trying to process large binary files (>2GB) in Apache Spark, but I'm running into the following error:File format is : .mf4 (Measurement Data Format) org.apache.spark.SparkException: The length of ... is 14749763360, which exceeds the max length ...

  • 1605 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @pra18, You can split and load the binary files using split command like this. ret = os.system("split -b 4020000 -a 4 -d large_data.dat large_data.dat_split_")

  • 0 kudos
1 More Replies
Labels