- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2021 01:22 AM
Hello everyone !
I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file.
Code:
import org.slf4j.{Logger, LoggerFactory}
import com.typesafe.config.{Config, ConfigFactory}
import org.apache.spark.sql.SparkSession
object Bootstrap extends MyLogging {
val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
val config: Config = ConfigFactory.load("application.conf")
def main(args: Array[String]): Unit = {
val url: String = config.getString("db.url")
val user: String = config.getString("db.user")
println(url)
println(user)
}
}
application.conf file :
db {
url = "jdbc:postgresql://localhost:5432/test"
user = "test"
}
I have uploaded the file to the dbfs and using the path to create the job.
Spark submit job json :
{
"new_cluster": {
"spark_version": "6.4.x-esr-scala2.11",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE",
"first_on_demand": 1,
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 1
},
"spark_submit_task": {
"parameters": [
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
},
"email_notifications": {},
"name": "application-conf-test",
"max_concurrent_runs": 1
}
I have used above json to create the spark submit job and tried to run the spark-submit job using datbricks CLI commands.
Error :
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at Bootstrap$.main(Test.scala:16)
at Bootstrap.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
I can see the below line in logs but the file is not getting loaded.
21/09/22 07:21:43 INFO SparkContext: Added file dbfs:/tmp/application.conf at dbfs:/tmp/application.conf with timestamp 1632295303654
21/09/22 07:21:43 INFO Utils: Fetching dbfs:/tmp/application.conf to /local_disk0/spark-20456b30-fddd-42d7-9b23-9e4c0d3c91cd/userFiles-ee199161-6f48-4c47-b1c7-763ce7c0895f/fetchFileTemp4713981355306806616.tmp
Please help me in passing this typesafe config file to the spark-submit job using the appropriate spark submit job parameters.
- Labels:
-
Cli
-
Config
-
Databrick Job
-
File
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2021 10:42 AM
Hi @Praveen Kumar Bachu ,
There are several limitations for spark-submit tasks:
- You can run spark-submit tasks only on new clusters.
- Spark-submit does not support cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
- Spark-submit does not support Databricks Utilities. To use Databricks Utilities, use JAR tasks instead.
- For more information on which parameters may be passed to a spark-submit task, see SparkSubmitTask.
Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2021 05:58 AM
Thank you so much @Kaniz Fatma , I'm looking forward for the answer!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2021 10:42 AM
Hi @Praveen Kumar Bachu ,
There are several limitations for spark-submit tasks:
- You can run spark-submit tasks only on new clusters.
- Spark-submit does not support cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
- Spark-submit does not support Databricks Utilities. To use Databricks Utilities, use JAR tasks instead.
- For more information on which parameters may be passed to a spark-submit task, see SparkSubmitTask.
Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2021 10:53 PM
Hi @Jose Gonzalez ,
Thanks for the reply and Yes I have gone through all the docs and steps regarding the spark-submit task and my above question is more about the passing config file in the spark-submit task.
Please re-check the above steps and let me know if that helps , if not we will write more details about what we are trying to do and you can tell whether it is possible in databricks or not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2021 12:49 PM
Hi @Praveen Kumar Bachu
The error shows that the job was not able to read your configuration. It means that the only way to pass your configurations is thru submit parameters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-05-2021 07:58 AM
Hi @Jose Gonzalez ,
Please see the below spark-submit json and few more examples we have tried with Spark submit parameters
spark-submit json:
{
"new_cluster": {
"spark_version": "6.4.x-esr-scala2.11",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE",
"first_on_demand": 1,
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 1
},
"spark_submit_task": {
"parameters": [
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
},
"email_notifications": {},
"name": "application-conf-test",
"max_concurrent_runs": 1
}
We have tried below spark_submit_task parameters in the above json
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=/tmp/application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=/tmp/",
"--conf",
"spark.executor.extraClassPath=/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/application.conf",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:./",
"--conf",
"spark.executor.extraClassPath=dbfs:./",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--driver-java-options",
"-Dconfig.file=application.conf",
"--conf",
"spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraJavaOptions=-Dconfig.file=application.conf",
"--conf",
"spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
For all the above spark_submit_task parameters , we are facing the same below specified error.
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at Bootstrap.main(Test.scala:16)
at Bootstrap.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
Please can you quickly help on this as we need this implementation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2022 05:38 PM
whats the code inside your main?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2022 01:21 AM
Hi @Praveen Kumar Bachu Could you please try the below approach? Let me if this works for you.
import java.io.File
import org.apache.spark.SparkFiles
//use parseFile instead of load
val config: Config = ConfigFactory.parseFile( new File(SparkFiles.get("application.conf")))
note: you will need to pass the file using --files
"--files",
"dbfs:/tmp/application.conf",
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2023 06:54 AM
I've experenced similar issues; please help to answer how to get this working;
I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah path
in either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no success.
spark.driver.extraJavaOptions
NOTE: TESTING VIA NOTEBOOK using the extraJavaOptions had no problems. (but we did notice, in the notebook,
below command would not succeed unless we try to ls the parent folders 1 by 1 first.
ls /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
cat /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
see below snippet;
spark_submit_task= {
"parameters": [
"--class", "com.source2sea.glue.GlueMain",
"--conf", f"spark.driver.extraJavaOptions={java_option_d_config_file}",
"--files", conf_path,
jar_full_path, MY-PARAMETERS
]
}
in my scala code i have code like this (use pureConfig, which is a wrapper of typeSafeConfig, ensured this is done: https://pureconfig.github.io/docs/faq.html#how-can-i-use-pureconfig-with-spark-210-problematic-shape...),
val source = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference)
def read(source: ConfigObjectSource): Either[Throwable, AppConfig] = {
implicit def hint[A] = ProductHint[A](ConfigFieldMapping(CamelCase, CamelCase))
logger.debug(s"Loading configuration ${source.config()}")
val original: Either[ConfigReaderFailures, AppConfig] = source.load[AppConfig]
logger.info(s"Loaded and casted configuration ${original}")
original.leftMap[Throwable](ConfigReaderException.apply)
}
error log
23/04/25 13:45:49 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 13:45:49 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf) dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
or
23/04/25 12:46:10 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 12:46:10 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf) /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
at com.source2sea.glue.config.AppConfig$.$anonfun$read$2(AppConfig.scala:31)

