09-27-2021 01:22 AM
Hello everyone !
I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file.
Code:
import org.slf4j.{Logger, LoggerFactory}
import com.typesafe.config.{Config, ConfigFactory}
import org.apache.spark.sql.SparkSession
object Bootstrap extends MyLogging {
val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
val config: Config = ConfigFactory.load("application.conf")
def main(args: Array[String]): Unit = {
val url: String = config.getString("db.url")
val user: String = config.getString("db.user")
println(url)
println(user)
}
}
application.conf file :
db {
url = "jdbc:postgresql://localhost:5432/test"
user = "test"
}
I have uploaded the file to the dbfs and using the path to create the job.
Spark submit job json :
{
"new_cluster": {
"spark_version": "6.4.x-esr-scala2.11",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE",
"first_on_demand": 1,
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 1
},
"spark_submit_task": {
"parameters": [
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
},
"email_notifications": {},
"name": "application-conf-test",
"max_concurrent_runs": 1
}
I have used above json to create the spark submit job and tried to run the spark-submit job using datbricks CLI commands.
Error :
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at Bootstrap$.main(Test.scala:16)
at Bootstrap.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
I can see the below line in logs but the file is not getting loaded.
21/09/22 07:21:43 INFO SparkContext: Added file dbfs:/tmp/application.conf at dbfs:/tmp/application.conf with timestamp 1632295303654
21/09/22 07:21:43 INFO Utils: Fetching dbfs:/tmp/application.conf to /local_disk0/spark-20456b30-fddd-42d7-9b23-9e4c0d3c91cd/userFiles-ee199161-6f48-4c47-b1c7-763ce7c0895f/fetchFileTemp4713981355306806616.tmp
Please help me in passing this typesafe config file to the spark-submit job using the appropriate spark submit job parameters.
09-28-2021 10:42 AM
Hi @Praveen Kumar Bachu ,
There are several limitations for spark-submit tasks:
Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask
09-27-2021 01:24 AM
Hi @ Praveen! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up with my team and get back to you soon.Thanks.
09-28-2021 05:58 AM
Thank you so much @Kaniz Fatma , I'm looking forward for the answer!
09-28-2021 06:30 AM
Hi @Praveen Kumar Bachu ,
I've relayed the issue to my team.
My team will revert back as soon as possible .
Thank you for your patience😀 .
09-28-2021 10:42 AM
Hi @Praveen Kumar Bachu ,
There are several limitations for spark-submit tasks:
Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask
09-28-2021 10:53 PM
Hi @Jose Gonzalez ,
Thanks for the reply and Yes I have gone through all the docs and steps regarding the spark-submit task and my above question is more about the passing config file in the spark-submit task.
Please re-check the above steps and let me know if that helps , if not we will write more details about what we are trying to do and you can tell whether it is possible in databricks or not.
10-01-2021 12:49 PM
Hi @Praveen Kumar Bachu
The error shows that the job was not able to read your configuration. It means that the only way to pass your configurations is thru submit parameters.
10-05-2021 07:58 AM
Hi @Jose Gonzalez ,
Please see the below spark-submit json and few more examples we have tried with Spark submit parameters
spark-submit json:
{
"new_cluster": {
"spark_version": "6.4.x-esr-scala2.11",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE",
"first_on_demand": 1,
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 1
},
"spark_submit_task": {
"parameters": [
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
},
"email_notifications": {},
"name": "application-conf-test",
"max_concurrent_runs": 1
}
We have tried below spark_submit_task parameters in the above json
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=/tmp/application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=/tmp/",
"--conf",
"spark.executor.extraClassPath=/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/application.conf",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:./",
"--conf",
"spark.executor.extraClassPath=dbfs:./",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--driver-java-options",
"-Dconfig.file=application.conf",
"--conf",
"spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraJavaOptions=-Dconfig.file=application.conf",
"--conf",
"spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
For all the above spark_submit_task parameters , we are facing the same below specified error.
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at Bootstrap.main(Test.scala:16)
at Bootstrap.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
Please can you quickly help on this as we need this implementation
03-14-2022 05:38 PM
whats the code inside your main?
03-15-2022 01:21 AM
Hi @Praveen Kumar Bachu Could you please try the below approach? Let me if this works for you.
import java.io.File
import org.apache.spark.SparkFiles
//use parseFile instead of load
val config: Config = ConfigFactory.parseFile( new File(SparkFiles.get("application.conf")))
note: you will need to pass the file using --files
"--files",
"dbfs:/tmp/application.conf",
04-25-2023 06:54 AM
I've experenced similar issues; please help to answer how to get this working;
I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah path
in either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no success.
spark.driver.extraJavaOptions
NOTE: TESTING VIA NOTEBOOK using the extraJavaOptions had no problems. (but we did notice, in the notebook,
below command would not succeed unless we try to ls the parent folders 1 by 1 first.
ls /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
cat /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
see below snippet;
spark_submit_task= {
"parameters": [
"--class", "com.source2sea.glue.GlueMain",
"--conf", f"spark.driver.extraJavaOptions={java_option_d_config_file}",
"--files", conf_path,
jar_full_path, MY-PARAMETERS
]
}
in my scala code i have code like this (use pureConfig, which is a wrapper of typeSafeConfig, ensured this is done: https://pureconfig.github.io/docs/faq.html#how-can-i-use-pureconfig-with-spark-210-problematic-shape...),
val source = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference)
def read(source: ConfigObjectSource): Either[Throwable, AppConfig] = {
implicit def hint[A] = ProductHint[A](ConfigFieldMapping(CamelCase, CamelCase))
logger.debug(s"Loading configuration ${source.config()}")
val original: Either[ConfigReaderFailures, AppConfig] = source.load[AppConfig]
logger.info(s"Loaded and casted configuration ${original}")
original.leftMap[Throwable](ConfigReaderException.apply)
}
error log
23/04/25 13:45:49 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 13:45:49 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf) dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
or
23/04/25 12:46:10 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 12:46:10 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf) /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
at com.source2sea.glue.config.AppConfig$.$anonfun$read$2(AppConfig.scala:31)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group