cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sweetnesh
by New Contributor
  • 1773 Views
  • 2 replies
  • 0 kudos

Not able to read S3 object through AssumedRoleCredentialProvider

SparkSession spark = SparkSession.builder() .appName("SparkS3Example") .master("local[1]") .getOrCreate(); spark.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", S3_ACCOUNT_KEY); spark.sparkContext().hadoopConf...

  • 1773 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @Sweetnesh Dholariya​,Does @Debayan Mukherjee​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thanks!

  • 0 kudos
1 More Replies
Databrickguy
by New Contributor II
  • 1148 Views
  • 1 replies
  • 0 kudos

How to use Java MaskFormatter in sparksql?

I create a function based on Java MaskFormatter function in Databricks/Scala.But when I call it from sparksql, I received error messageError in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/t...

  • 1148 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Tim zhang​ :The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register t...

  • 0 kudos
Rahul2025
by New Contributor III
  • 3592 Views
  • 4 replies
  • 4 kudos

Make environment variables defined in init script available to Spark JVM job?

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

  • 3592 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rahul K​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
3 More Replies
Rahul2025
by New Contributor III
  • 5188 Views
  • 11 replies
  • 1 kudos

Limitation on size of init script

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

  • 5188 Views
  • 11 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Rahul K​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

  • 1 kudos
10 More Replies
gauthamchettiar
by New Contributor II
  • 1757 Views
  • 0 replies
  • 1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

BroadCastJoin 1M
  • 1757 Views
  • 0 replies
  • 1 kudos
rammy
by Contributor III
  • 1663 Views
  • 1 replies
  • 5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

error screenshot Jar dependencies
  • 1663 Views
  • 1 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @Ramesh Bathini​ In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

  • 5 kudos
mattmunz
by New Contributor III
  • 3446 Views
  • 1 replies
  • 4 kudos

JDBC Error: Error occured while deserializing arrow data

I am getting the following error in my Java application.java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500618) Error occured while deserializing arrow data: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not availableI beli...

  • 3446 Views
  • 1 replies
  • 4 kudos
Latest Reply
User16753725469
Contributor II
  • 4 kudos

Please try adding the below(--add-opens flag) java command line flags in your jvm call:% javac -classpath SparkJDBC42Example.jar:. jdbc_example.java                   % java --add-opens=java.base/java.nio=ALL-UNNAMED -classpath SparkJDBC42Example.jar...

  • 4 kudos
BkP
by Contributor
  • 2223 Views
  • 2 replies
  • 3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

requirement
  • 2223 Views
  • 2 replies
  • 3 kudos
Latest Reply
BkP
Contributor
  • 3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma​ , @Vartika Nain​ , @Bilal Aslam​ 

  • 3 kudos
1 More Replies
witnessthee
by New Contributor II
  • 6610 Views
  • 3 replies
  • 2 kudos

Resolved! Error when using pyflink on databricks, An error occurred while trying to connect to the Java server

Hi, right now I am trying to run a pyflink script that can connect to a kafka server. When I run that script, I got an error "An error occurred while trying to connect to the Java server 127.0.0.1:35529". Do I need to install a extra jdk for that? er...

  • 6610 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

did you get Flink running on the Databricks cluster? Because that seems to be the issue here.

  • 2 kudos
2 More Replies
data_serf
by New Contributor
  • 3955 Views
  • 3 replies
  • 1 kudos

Resolved! How to integrate java 11 code in Databricks

Hi all,We're trying to attach java libraries which are compiled/packaged using Java 11.After doing some research it looks like even the most recent runtimes use Java 8 which can't run the Java 11 code ("wrong version 55.0, should be 52.0" errors)Is t...

  • 3955 Views
  • 3 replies
  • 1 kudos
Latest Reply
matthewrj
New Contributor II
  • 1 kudos

I have tried setting JNAME=zulu11-ca-amd64 under Cluster > Advanced options > Spark > Environment variables but it doesn't seem to work. I still get errors indicating Java 8 is the JRE and in the Spark UI under "Environment" I still see:Java Home: /u...

  • 1 kudos
2 More Replies
isaac_gritz
by Databricks Employee
  • 1764 Views
  • 1 replies
  • 6 kudos

Versions of Spark, Python, Scala, R in each Databricks Runtime

What version of Spark, Python, Scala, R are included in each Databricks Runtime? What libraries are pre-installed?You can find this info at the Databricks runtime releases page (AWS | Azure | GCP).Let us know if you have any additional questions on t...

  • 1764 Views
  • 1 replies
  • 6 kudos
Latest Reply
maxdata
Databricks Employee
  • 6 kudos

Wow! Thanks for the help @Isaac Gritz​ !

  • 6 kudos
sage5616
by Valued Contributor
  • 8336 Views
  • 5 replies
  • 7 kudos

Resolved! SQL Error when querying any tables/views on a Databricks cluster via Dbeaver.

I am able to connect to the cluster, browse its hive catalog, see tables/views and columns/datatypesRunning a simple select statement from a view on a parquet file produces this error and no other results:"SQL Error [500540] [HY000]: [Databricks][Dat...

  • 8336 Views
  • 5 replies
  • 7 kudos
Latest Reply
sage5616
Valued Contributor
  • 7 kudos

Update. I have tried SQL Workbench/J and encountered exactly the same error(s) as with Dbeaver. I have also tried JetBrains DataGrip and it worked flawlessly. Able to connect, browse the databases and query tables/views. https://docs.microsoft.com/en...

  • 7 kudos
4 More Replies
codevisionz
by New Contributor
  • 527 Views
  • 0 replies
  • 0 kudos

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, da...

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, data structures, sorting algorithms, mathematical functions, mathematical sequences, threads, exceptio...

  • 527 Views
  • 0 replies
  • 0 kudos
mani238
by New Contributor III
  • 4674 Views
  • 4 replies
  • 4 kudos
  • 4674 Views
  • 4 replies
  • 4 kudos
Latest Reply
mani238
New Contributor III
  • 4 kudos

Hi @Kaniz Fatma​  , I got the solution based on the @Hubert Dudek​  Answer .Thanks @Hubert Dudek​  . Another Doubt:How do i Automate the Azure Synapse Concept . Please help me ..Thanks

  • 4 kudos
3 More Replies
_r_vind1199
by New Contributor II
  • 3492 Views
  • 3 replies
  • 3 kudos

Resolved! Pyspark installation issue

When I try to start pyspark session in pycharm. It throws me this error "RuntimeError("Java gateway process exited before sending its port number"). Could anyone help me to solve this?

  • 3492 Views
  • 3 replies
  • 3 kudos
Latest Reply
_r_vind1199
New Contributor II
  • 3 kudos

@Aashita Ramteke​ , Pyspark version 3.2.1

  • 3 kudos
2 More Replies
Labels