Databricks Community

Praveen · ‎09-27-2021

Hello everyone !

I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file.

Code:

import org.slf4j.{Logger, LoggerFactory}
 
import com.typesafe.config.{Config, ConfigFactory}
 
import org.apache.spark.sql.SparkSession
 
 
 
 
 
object Bootstrap extends MyLogging {
 
 
 
 val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
 
 
 
 val config: Config = ConfigFactory.load("application.conf")
 
 
 
 def main(args: Array[String]): Unit = {
 
   val url: String = config.getString("db.url")
 
  val user: String = config.getString("db.user")
 
   println(url)
 
  println(user)
 
 }
 
}

application.conf file :

db {
  url = "jdbc:postgresql://localhost:5432/test"
  user = "test"
}

I have uploaded the file to the dbfs and using the path to create the job.

Spark submit job json :

{
  "new_cluster": {
    "spark_version": "6.4.x-esr-scala2.11",
    "azure_attributes": {
      "availability": "ON_DEMAND_AZURE",
      "first_on_demand": 1,
      "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS3_v2",
    "enable_elastic_disk": true,
    "num_workers": 1
  },
  "spark_submit_task": {
    "parameters": [
      "--class",
      "Bootstrap",
      "--conf",
      "spark.driver.extraClassPath=dbfs:/tmp/",
      "--conf",
      "spark.executor.extraClassPath=dbfs:/tmp/",
      "--files",
      "dbfs:/tmp/application.conf",
      "dbfs:/tmp/code-assembly-0.1.0.jar"
    ]
  },
  "email_notifications": {},
  "name": "application-conf-test",
  "max_concurrent_runs": 1
}

I have used above json to create the spark submit job and tried to run the spark-submit job using datbricks CLI commands.

Error :

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
	at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
	at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
	at Bootstrap$.main(Test.scala:16)
	at Bootstrap.main(Test.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)

I can see the below line in logs but the file is not getting loaded.

21/09/22 07:21:43 INFO SparkContext: Added file dbfs:/tmp/application.conf at dbfs:/tmp/application.conf with timestamp 1632295303654
21/09/22 07:21:43 INFO Utils: Fetching dbfs:/tmp/application.conf to /local_disk0/spark-20456b30-fddd-42d7-9b23-9e4c0d3c91cd/userFiles-ee199161-6f48-4c47-b1c7-763ce7c0895f/fetchFileTemp4713981355306806616.tmp

Please help me in passing this typesafe config file to the spark-submit job using the appropriate spark submit job parameters.

jose_gonzalez · ‎09-28-2021

Hi @Praveen Kumar Bachu ,

There are several limitations for spark-submit tasks:

You can run spark-submit tasks only on new clusters.
Spark-submit does not support cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
Spark-submit does not support Databricks Utilities. To use Databricks Utilities, use JAR tasks instead.
For more information on which parameters may be passed to a spark-submit task, see SparkSubmitTask.

Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask

View solution in original post

Praveen · ‎09-28-2021

Thank you so much @Kaniz Fatma , I'm looking forward for the answer!

jose_gonzalez · ‎09-28-2021

Hi @Praveen Kumar Bachu ,

There are several limitations for spark-submit tasks:

You can run spark-submit tasks only on new clusters.
Spark-submit does not support cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
Spark-submit does not support Databricks Utilities. To use Databricks Utilities, use JAR tasks instead.
For more information on which parameters may be passed to a spark-submit task, see SparkSubmitTask.

Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask

Praveen · ‎09-28-2021

Hi @Jose Gonzalez ,

Thanks for the reply and Yes I have gone through all the docs and steps regarding the spark-submit task and my above question is more about the passing config file in the spark-submit task.

Please re-check the above steps and let me know if that helps , if not we will write more details about what we are trying to do and you can tell whether it is possible in databricks or not.

jose_gonzalez · ‎10-01-2021

Hi @Praveen Kumar Bachu

The error shows that the job was not able to read your configuration. It means that the only way to pass your configurations is thru submit parameters.

Praveen · ‎10-05-2021

Hi @Jose Gonzalez ,

Please see the below spark-submit json and few more examples we have tried with Spark submit parameters

spark-submit json:

{
  "new_cluster": {
    "spark_version": "6.4.x-esr-scala2.11",
    "azure_attributes": {
      "availability": "ON_DEMAND_AZURE",
      "first_on_demand": 1,
      "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS3_v2",
    "enable_elastic_disk": true,
    "num_workers": 1
  },
  "spark_submit_task": {
    "parameters": [
      "--class",
      "Bootstrap",
      "--conf",
      "spark.driver.extraClassPath=dbfs:/tmp/",
      "--conf",
      "spark.executor.extraClassPath=dbfs:/tmp/",
      "--files",
      "dbfs:/tmp/application.conf",
      "dbfs:/tmp/code-assembly-0.1.0.jar"
    ]
  },
  "email_notifications": {},
  "name": "application-conf-test",
  "max_concurrent_runs": 1
}

We have tried below spark_submit_task parameters in the above json

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=/tmp/application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=/tmp/",
  "--conf",
  "spark.executor.extraClassPath=/tmp/",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=dbfs:/tmp/application.conf",
  "--conf",
  "spark.executor.extraClassPath=dbfs:/tmp/application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=dbfs:/tmp/",
  "--conf",
  "spark.executor.extraClassPath=dbfs:/tmp/",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=dbfs:./",
  "--conf",
  "spark.executor.extraClassPath=dbfs:./",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--driver-java-options",
  "-Dconfig.file=application.conf",
  "--conf",
  "spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraJavaOptions=-Dconfig.file=application.conf",
  "--conf",
  "spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

For all the above spark_submit_task parameters , we are facing the same below specified error.

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
	at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
	at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
	at Bootstrap.main(Test.scala:16)
	at Bootstrap.main(Test.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)

Please can you quickly help on this as we need this implementation

jose_gonzalez · ‎03-14-2022

whats the code inside your main?

User16763506477 · ‎03-15-2022

Hi @Praveen Kumar Bachu Could you please try the below approach? Let me if this works for you.

import java.io.File
import org.apache.spark.SparkFiles
 
//use parseFile instead of load
val config: Config = ConfigFactory.parseFile( new File(SparkFiles.get("application.conf")))

note: you will need to pass the file using --files

"--files",
      "dbfs:/tmp/application.conf",

source2sea · ‎04-25-2023

I've experenced similar issues; please help to answer how to get this working;

I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah path

in either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no success.

spark.driver.extraJavaOptions

NOTE: TESTING VIA NOTEBOOK using the extraJavaOptions had no problems. (but we did notice, in the notebook,

below command would not succeed unless we try to ls the parent folders 1 by 1 first.

ls /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
cat /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf

see below snippet;

spark_submit_task= {

"parameters": [

"--class", "com.source2sea.glue.GlueMain",

"--conf", f"spark.driver.extraJavaOptions={java_option_d_config_file}",

"--files", conf_path,

jar_full_path, MY-PARAMETERS

]

}

in my scala code i have code like this (use pureConfig, which is a wrapper of typeSafeConfig, ensured this is done: https://pureconfig.github.io/docs/faq.html#how-can-i-use-pureconfig-with-spark-210-problematic-shape...),

val source = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference)
 
def read(source: ConfigObjectSource): Either[Throwable, AppConfig] = {
 
  implicit def hint[A] = ProductHint[A](ConfigFieldMapping(CamelCase, CamelCase))
 
  logger.debug(s"Loading configuration ${source.config()}")
  
  val original: Either[ConfigReaderFailures, AppConfig] = source.load[AppConfig]
 
  logger.info(s"Loaded and casted configuration ${original}")
 
  original.leftMap[Throwable](ConfigReaderException.apply)
 
}

error log

23/04/25 13:45:49 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 13:45:49 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
  - (dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf) dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
 
 
or
 
 
23/04/25 12:46:10 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 12:46:10 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
  - (/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf) /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
 
	at com.source2sea.glue.config.AppConfig$.$anonfun$read$2(AppConfig.scala:31)

Databricks Community

Pass Typesafe config file to the Spark Submit Job

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!