cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sweetnesh
by New Contributor
  • 1073 Views
  • 2 replies
  • 0 kudos

Not able to read S3 object through AssumedRoleCredentialProvider

SparkSession spark = SparkSession.builder() .appName("SparkS3Example") .master("local[1]") .getOrCreate(); spark.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", S3_ACCOUNT_KEY); spark.sparkContext().hadoopConf...

  • 1073 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vartika
Moderator
  • 0 kudos

Hi @Sweetnesh Dholariya​,Does @Debayan Mukherjee​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thanks!

  • 0 kudos
1 More Replies
Databrickguy
by New Contributor II
  • 664 Views
  • 1 replies
  • 0 kudos

How to use Java MaskFormatter in sparksql?

I create a function based on Java MaskFormatter function in Databricks/Scala.But when I call it from sparksql, I received error messageError in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/t...

  • 664 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Tim zhang​ :The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register t...

  • 0 kudos
Rahul2025
by New Contributor III
  • 2164 Views
  • 4 replies
  • 4 kudos

Make environment variables defined in init script available to Spark JVM job?

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

  • 2164 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rahul K​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
3 More Replies
Rahul2025
by New Contributor III
  • 2664 Views
  • 11 replies
  • 1 kudos

Limitation on size of init script

Hi,We're using Databricks Runtime version 11.3LTS and executing a Spark Java Job using a Job Cluster. To automate the execution of this job, we need to define (source in from bash config files) some environment variables through an init script (clust...

  • 2664 Views
  • 11 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Rahul K​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

  • 1 kudos
10 More Replies
gauthamchettiar
by New Contributor II
  • 932 Views
  • 0 replies
  • 1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

BroadCastJoin 1M
  • 932 Views
  • 0 replies
  • 1 kudos
rammy
by Contributor III
  • 935 Views
  • 1 replies
  • 5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

error screenshot Jar dependencies
  • 935 Views
  • 1 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @Ramesh Bathini​ In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

  • 5 kudos
mattmunz
by New Contributor III
  • 2069 Views
  • 1 replies
  • 4 kudos

JDBC Error: Error occured while deserializing arrow data

I am getting the following error in my Java application.java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500618) Error occured while deserializing arrow data: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not availableI beli...

  • 2069 Views
  • 1 replies
  • 4 kudos
Latest Reply
User16753725469
Contributor II
  • 4 kudos

Please try adding the below(--add-opens flag) java command line flags in your jvm call:% javac -classpath SparkJDBC42Example.jar:. jdbc_example.java                   % java --add-opens=java.base/java.nio=ALL-UNNAMED -classpath SparkJDBC42Example.jar...

  • 4 kudos
BkP
by Contributor
  • 1353 Views
  • 3 replies
  • 3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

requirement
  • 1353 Views
  • 3 replies
  • 3 kudos
Latest Reply
BkP
Contributor
  • 3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma​ , @Vartika Nain​ , @Bilal Aslam​ 

  • 3 kudos
2 More Replies
witnessthee
by New Contributor II
  • 5000 Views
  • 4 replies
  • 2 kudos

Resolved! Error when using pyflink on databricks, An error occurred while trying to connect to the Java server

Hi, right now I am trying to run a pyflink script that can connect to a kafka server. When I run that script, I got an error "An error occurred while trying to connect to the Java server 127.0.0.1:35529". Do I need to install a extra jdk for that? er...

  • 5000 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Tam daad​, We haven’t heard from you since the last response from @Werner Stinckens​, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherw...

  • 2 kudos
3 More Replies
data_serf
by New Contributor
  • 2125 Views
  • 3 replies
  • 1 kudos

Resolved! How to integrate java 11 code in Databricks

Hi all,We're trying to attach java libraries which are compiled/packaged using Java 11.After doing some research it looks like even the most recent runtimes use Java 8 which can't run the Java 11 code ("wrong version 55.0, should be 52.0" errors)Is t...

  • 2125 Views
  • 3 replies
  • 1 kudos
Latest Reply
matthewrj
New Contributor II
  • 1 kudos

I have tried setting JNAME=zulu11-ca-amd64 under Cluster > Advanced options > Spark > Environment variables but it doesn't seem to work. I still get errors indicating Java 8 is the JRE and in the Spark UI under "Environment" I still see:Java Home: /u...

  • 1 kudos
2 More Replies
isaac_gritz
by Valued Contributor II
  • 1020 Views
  • 2 replies
  • 7 kudos

Versions of Spark, Python, Scala, R in each Databricks Runtime

What version of Spark, Python, Scala, R are included in each Databricks Runtime? What libraries are pre-installed?You can find this info at the Databricks runtime releases page (AWS | Azure | GCP).Let us know if you have any additional questions on t...

  • 1020 Views
  • 2 replies
  • 7 kudos
Latest Reply
maxdata
New Contributor II
  • 7 kudos

Wow! Thanks for the help @Isaac Gritz​ !

  • 7 kudos
1 More Replies
sage5616
by Valued Contributor
  • 5208 Views
  • 5 replies
  • 7 kudos

Resolved! SQL Error when querying any tables/views on a Databricks cluster via Dbeaver.

I am able to connect to the cluster, browse its hive catalog, see tables/views and columns/datatypesRunning a simple select statement from a view on a parquet file produces this error and no other results:"SQL Error [500540] [HY000]: [Databricks][Dat...

  • 5208 Views
  • 5 replies
  • 7 kudos
Latest Reply
sage5616
Valued Contributor
  • 7 kudos

Update. I have tried SQL Workbench/J and encountered exactly the same error(s) as with Dbeaver. I have also tried JetBrains DataGrip and it worked flawlessly. Able to connect, browse the databases and query tables/views. https://docs.microsoft.com/en...

  • 7 kudos
4 More Replies
codevisionz
by New Contributor
  • 263 Views
  • 0 replies
  • 0 kudos

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, da...

Our Python Code Examples covers basic concepts, control structures, functions, lists, classes, objects, inheritance, polymorphism, file operations, data structures, sorting algorithms, mathematical functions, mathematical sequences, threads, exceptio...

  • 263 Views
  • 0 replies
  • 0 kudos
mani238
by New Contributor III
  • 2947 Views
  • 7 replies
  • 4 kudos
  • 2947 Views
  • 7 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @manivannan p​ ​ , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

  • 4 kudos
6 More Replies
_r_vind1199
by New Contributor II
  • 1904 Views
  • 4 replies
  • 3 kudos

Resolved! Pyspark installation issue

When I try to start pyspark session in pycharm. It throws me this error "RuntimeError("Java gateway process exited before sending its port number"). Could anyone help me to solve this?

  • 1904 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Aravind A​ , Did you try all the steps as suggested by @Aashita Ramteke​ ?

  • 3 kudos
3 More Replies
Labels