Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am getting the following error in my Java application.java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500618) Error occured while deserializing arrow data: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not availableI beli...
For anyone encountering this issue in 2025, I was able to solve it by using the --add-opens=jdk.unsupported/sun.misc=ALL-UNNAMEDoption in combination with the latest jdbc driver (v2.7.1). I was using the driver in dbeaver, but I assume the issue coul...
Hi @Sweetnesh Dholariya,Does @Debayan Mukherjee's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thanks!
I create a function based on Java MaskFormatter function in Databricks/Scala.But when I call it from sparksql, I received error messageError in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/t...
@Tim zhang :The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register t...
Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...
Hi @Rahul K Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...
Hi @Rahul K Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...
I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession
...
I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...
Hi @Ramesh Bathini In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...
Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...
hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma , @Vartika Nain , @Bilal Aslam
Hi, right now I am trying to run a pyflink script that can connect to a kafka server. When I run that script, I got an error "An error occurred while trying to connect to the Java server 127.0.0.1:35529". Do I need to install a extra jdk for that? er...
Hi all,We're trying to attach java libraries which are compiled/packaged using Java 11.After doing some research it looks like even the most recent runtimes use Java 8 which can't run the Java 11 code ("wrong version 55.0, should be 52.0" errors)Is t...
I have tried setting JNAME=zulu11-ca-amd64 under Cluster > Advanced options > Spark > Environment variables but it doesn't seem to work. I still get errors indicating Java 8 is the JRE and in the Spark UI under "Environment" I still see:Java Home: /u...
What version of Spark, Python, Scala, R are included in each Databricks Runtime? What libraries are pre-installed?You can find this info at the Databricks runtime releases page (AWS | Azure | GCP).Let us know if you have any additional questions on t...
I am able to connect to the cluster, browse its hive catalog, see tables/views and columns/datatypesRunning a simple select statement from a view on a parquet file produces this error and no other results:"SQL Error [500540] [HY000]: [Databricks][Dat...
Update. I have tried SQL Workbench/J and encountered exactly the same error(s) as with Dbeaver. I have also tried JetBrains DataGrip and it worked flawlessly. Able to connect, browse the databases and query tables/views. https://docs.microsoft.com/en...
Hi @Kaniz Fatma , I got the solution based on the @Hubert Dudek Answer .Thanks @Hubert Dudek . Another Doubt:How do i Automate the Azure Synapse Concept . Please help me ..Thanks
When I try to start pyspark session in pycharm. It throws me this error "RuntimeError("Java gateway process exited before sending its port number"). Could anyone help me to solve this?