cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Pass Typesafe config file to the Spark Submit Job

Praveen
New Contributor II

Hello everyone !

I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file.

Code: 

import org.slf4j.{Logger, LoggerFactory}
 
import com.typesafe.config.{Config, ConfigFactory}
 
import org.apache.spark.sql.SparkSession
 
 
 
 
 
object Bootstrap extends MyLogging {
 
 
 
 val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
 
 
 
 val config: Config = ConfigFactory.load("application.conf")
 
 
 
 def main(args: Array[String]): Unit = {
 
   val url: String = config.getString("db.url")
 
  val user: String = config.getString("db.user")
 
   println(url)
 
  println(user)
 
 }
 
}

application.conf file :

db {
  url = "jdbc:postgresql://localhost:5432/test"
  user = "test"
}

I have uploaded the file to the dbfs and using the path to create the job.

Spark submit job json :

{
  "new_cluster": {
    "spark_version": "6.4.x-esr-scala2.11",
    "azure_attributes": {
      "availability": "ON_DEMAND_AZURE",
      "first_on_demand": 1,
      "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS3_v2",
    "enable_elastic_disk": true,
    "num_workers": 1
  },
  "spark_submit_task": {
    "parameters": [
      "--class",
      "Bootstrap",
      "--conf",
      "spark.driver.extraClassPath=dbfs:/tmp/",
      "--conf",
      "spark.executor.extraClassPath=dbfs:/tmp/",
      "--files",
      "dbfs:/tmp/application.conf",
      "dbfs:/tmp/code-assembly-0.1.0.jar"
    ]
  },
  "email_notifications": {},
  "name": "application-conf-test",
  "max_concurrent_runs": 1
}

I have used above json to create the spark submit job and tried to run the spark-submit job using datbricks CLI commands.

Error :

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
	at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
	at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
	at Bootstrap$.main(Test.scala:16)
	at Bootstrap.main(Test.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)

I can see the below line in logs but the file is not getting loaded.

21/09/22 07:21:43 INFO SparkContext: Added file dbfs:/tmp/application.conf at dbfs:/tmp/application.conf with timestamp 1632295303654
21/09/22 07:21:43 INFO Utils: Fetching dbfs:/tmp/application.conf to /local_disk0/spark-20456b30-fddd-42d7-9b23-9e4c0d3c91cd/userFiles-ee199161-6f48-4c47-b1c7-763ce7c0895f/fetchFileTemp4713981355306806616.tmp

Please help me in passing this typesafe config file to the spark-submit job using the appropriate spark submit job parameters.

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Moderator
Moderator

Hi @Praveen Kumar Bachu​ ,

There are several limitations for spark-submit tasks:

  • You can run spark-submit tasks only on new clusters.
  • Spark-submit does not support cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
  • Spark-submit does not support Databricks Utilities. To use Databricks Utilities, use JAR tasks instead.
  • For more information on which parameters may be passed to a spark-submit task, see SparkSubmitTask.

Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask

View solution in original post

10 REPLIES 10

Kaniz
Community Manager
Community Manager

Hi @ Praveen! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up with my team and get back to you soon.Thanks.

Praveen
New Contributor II

Thank you so much @Kaniz Fatma​ , I'm looking forward for the answer!

Kaniz
Community Manager
Community Manager

Hi @Praveen Kumar Bachu​ ,

I've relayed the issue to my team.

My team will revert back as soon as possible .

Thank you for your patience😀 .

jose_gonzalez
Moderator
Moderator

Hi @Praveen Kumar Bachu​ ,

There are several limitations for spark-submit tasks:

  • You can run spark-submit tasks only on new clusters.
  • Spark-submit does not support cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
  • Spark-submit does not support Databricks Utilities. To use Databricks Utilities, use JAR tasks instead.
  • For more information on which parameters may be passed to a spark-submit task, see SparkSubmitTask.

Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask

Hi @Jose Gonzalez​ ,

Thanks for the reply and Yes I have gone through all the docs and steps regarding the spark-submit task and my above question is more about the passing config file in the spark-submit task.

Please re-check the above steps and let me know if that helps , if not we will write more details about what we are trying to do and you can tell whether it is possible in databricks or not.

Hi @Praveen Kumar Bachu​ 

The error shows that the job was not able to read your configuration. It means that the only way to pass your configurations is thru submit parameters.

Praveen
New Contributor II

Hi @Jose Gonzalez​ , 

Please see the below spark-submit json and few more examples we have tried with Spark submit parameters

spark-submit json:

{
  "new_cluster": {
    "spark_version": "6.4.x-esr-scala2.11",
    "azure_attributes": {
      "availability": "ON_DEMAND_AZURE",
      "first_on_demand": 1,
      "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS3_v2",
    "enable_elastic_disk": true,
    "num_workers": 1
  },
  "spark_submit_task": {
    "parameters": [
      "--class",
      "Bootstrap",
      "--conf",
      "spark.driver.extraClassPath=dbfs:/tmp/",
      "--conf",
      "spark.executor.extraClassPath=dbfs:/tmp/",
      "--files",
      "dbfs:/tmp/application.conf",
      "dbfs:/tmp/code-assembly-0.1.0.jar"
    ]
  },
  "email_notifications": {},
  "name": "application-conf-test",
  "max_concurrent_runs": 1
}

We have tried below spark_submit_task parameters in the above json

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=/tmp/application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=/tmp/",
  "--conf",
  "spark.executor.extraClassPath=/tmp/",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=dbfs:/tmp/application.conf",
  "--conf",
  "spark.executor.extraClassPath=dbfs:/tmp/application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=dbfs:/tmp/",
  "--conf",
  "spark.executor.extraClassPath=dbfs:/tmp/",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraClassPath=dbfs:./",
  "--conf",
  "spark.executor.extraClassPath=dbfs:./",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
  "--class",
  "Bootstrap",
  "--driver-java-options",
  "-Dconfig.file=application.conf",
  "--conf",
  "spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
  "--class",
  "Bootstrap",
  "--conf",
  "spark.driver.extraJavaOptions=-Dconfig.file=application.conf",
  "--conf",
  "spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
  "--files",
  "dbfs:/tmp/application.conf",
  "dbfs:/tmp/code-assembly-0.1.0.jar"
]

For all the above spark_submit_task parameters , we are facing the same below specified error.

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
	at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
	at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
	at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
	at Bootstrap.main(Test.scala:16)
	at Bootstrap.main(Test.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)

Please can you quickly help on this as we need this implementation

whats the code inside your main?

User16763506477
Contributor III

Hi @Praveen Kumar Bachu​  Could you please try the below approach? Let me if this works for you.

import java.io.File
import org.apache.spark.SparkFiles
 
//use parseFile instead of load
val config: Config = ConfigFactory.parseFile( new File(SparkFiles.get("application.conf")))

note: you will need to pass the file using --files

"--files",
      "dbfs:/tmp/application.conf",

source2sea
Contributor

I've experenced similar issues; please help to answer how to get this working;

I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah path

in either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no success.

spark.driver.extraJavaOptions

 NOTE: TESTING VIA NOTEBOOK using the extraJavaOptions had no problems. (but we did notice, in the notebook,

below command would not succeed unless we try to ls the parent folders 1 by 1 first.

ls /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
cat /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf

 see below snippet;

spark_submit_task= {

"parameters": [

"--class", "com.source2sea.glue.GlueMain",

"--conf", f"spark.driver.extraJavaOptions={java_option_d_config_file}",

"--files", conf_path,

jar_full_path, MY-PARAMETERS

]

}

in my scala code i have code like this (use pureConfig, which is a wrapper of typeSafeConfig, ensured this is done: https://pureconfig.github.io/docs/faq.html#how-can-i-use-pureconfig-with-spark-210-problematic-shape...),

val source = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference)
 
def read(source: ConfigObjectSource): Either[Throwable, AppConfig] = {
 
  implicit def hint[A] = ProductHint[A](ConfigFieldMapping(CamelCase, CamelCase))
 
  logger.debug(s"Loading configuration ${source.config()}")
  
  val original: Either[ConfigReaderFailures, AppConfig] = source.load[AppConfig]
 
  logger.info(s"Loaded and casted configuration ${original}")
 
  original.leftMap[Throwable](ConfigReaderException.apply)
 
}

error log

23/04/25 13:45:49 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 13:45:49 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
  - (dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf) dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
 
 
or
 
 
23/04/25 12:46:10 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 12:46:10 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
  - (/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf) /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
 
	at com.source2sea.glue.config.AppConfig$.$anonfun$read$2(AppConfig.scala:31)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.