โ03-07-2024 11:13 PM
I am trying to execute a scala jar in notebook. When I execute it explicitly I am able to run the jar like this :
but when I am trying to run a notebook through databricks workflow I get the below error : The spark context has stopped and the driver is restarting. Your notebook will be automatically reattached.
Steps I have taken till now :
tried increasing spark driver memory like this :
โ03-08-2024 03:13 AM
Hi @Sachin_, To run a Scala JAR within a Databricks notebook, consider the following approaches:
Using Scala Scripts: Instead of directly using -jar
, try running your Scala JAR using Scala scripts. In your notebook, execute the following command:
scala -classpath target/scaaaaaaaaala-1.0.jar scaaalaaa.App Hello World!
Replace target/scaaaaaaaaala-1.0.jar
with the actual path to your JAR file and adjust the class name...1.
Using Databricks Job Cluster: If youโre running the notebook as part of a Databricks workflow, consider creating a Databricks job. In the job configuration, specify the JAR as a dependency. You can use the following command:
scala -cp <Your Jar> <Main Class> <arguments>
Replace <Your Jar>
with the actual path to your JAR file, <Main Class>
with the main class name, and...2.
Additional Considerations:
Remember that Databricks provides a powerful platform for big data processing, but sometimes specific adjustments are needed to ensure smooth execution. Feel free to provide more details or ask for further assistance if needed! ๐
โ03-17-2024 10:59 PM
Hello @Kani ! Thanks for your time. We have certain practices followed in our organisation to trigger the Jar which is by using py4j. So, I tried getting the logs which I was not getting earlier. The error is as below :
'{"error_code": 1, "error_message": "py4j does not exist in the JVM -- org.apache.spark.SparkException: Trying to putInheritedProperty with no active spark context\n\tat org.apache.spark.credentials.CredentialContext$.$anonfun$putInheritedProperty$2(CredentialContext.scala:188)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat org.apache.spark.credentials.CredentialContext$.$anonfun$putInheritedProperty$1(CredentialContext.scala:188)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat org.apache.spark.credentials.CredentialContext$.putInheritedProperty(CredentialContext.scala:187)\n\tat com.databricks.backend.daemon.driver.SparkThreadLocalUtils$$anon$1.$anonfun$run$2(SparkThreadLocalUtils.scala:56)\n\tat com.databricks.backend.daemon.driver.SparkThreadLocalUtils$$anon$1.$anonfun$run$2$adapted(SparkThreadLocalUtils.scala:56)\n\tat scala.Option.foreach(Option.scala:407)\n\tat com.databricks.backend.daemon.driver.SparkThreadLocalUtils$$anon$1.run(SparkThreadLocalUtils.scala:56)\n\tat java.lang.Iterable.forEach(Iterable.java:75)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:194)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:115)\n\tat java.lang.Thread.run(Thread.java:750)\n"}', -- NEW Internal data field
'1', -- NEW error code field
NULL -- NEW Write ID field
-- Forward compatibility padding
)
2024-03-15 10:29:49,643:offer_toolbox.retry:DEBUG:Last attempt for 'robust_execute'
2024-03-15 10:29:49,644:offer_metadata.v1.store:DEBUG:Insert event from Step v1
2024-03-15 10:29:49,645:offer_metadata.sql.engine:DEBUG:End cursor self._connection_count=1 CALLED:keep_alive=False FIXED:self.keep_alive=True
2024-03-15 10:29:49,646:offer_metadata.v1.operational:INFO:[AAM_Demo][demo_action] END Step
2024-03-15 10:29:50,711:offer_companion.companion:WARNING:Intercept IPython error py4j does not exist in the JVM -- org.apache.spark.SparkException: Trying to putInheritedProperty with no active spark context
at org.apache.spark.credentials.CredentialContext$.$anonfun$putInheritedProperty$2(CredentialContext.scala:188)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.credentials.CredentialContext$.$anonfun$putInheritedProperty$1(CredentialContext.scala:188)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.credentials.CredentialContext$.putInheritedProperty(CredentialContext.scala:187)
at com.databricks.backend.daemon.driver.SparkThreadLocalUtils$$anon$1.$anonfun$run$2(SparkThreadLocalUtils.scala:56)
at com.databricks.backend.daemon.driver.SparkThreadLocalUtils$$anon$1.$anonfun$run$2$adapted(SparkThreadLocalUtils.scala:56)
at scala.Option.foreach(Option.scala:407)
at com.databricks.backend.daemon.driver.SparkThreadLocalUtils$$anon$1.run(SparkThreadLocalUtils.scala:56)
at java.lang.Iterable.forEach(Iterable.java:75)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:194)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Any idea on how I can initialize the cluster with py4j?
โ03-20-2024 02:32 PM
Could you share the code you have in your JAR file? how are you creating your Spark context in your JAR file?
โ04-04-2024 01:51 AM
Hello @jose_gonzalez and @Kani ! Apologies foe the late reply. This is how we initialize spark session in ETL
Excited to expand your horizons with us? Click here to Register and begin your journey to success!
Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!