โ09-27-2021 01:22 AM
Hello everyone !
I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file.
Code:
import org.slf4j.{Logger, LoggerFactory}
import com.typesafe.config.{Config, ConfigFactory}
import org.apache.spark.sql.SparkSession
object Bootstrap extends MyLogging {
val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
val config: Config = ConfigFactory.load("application.conf")
def main(args: Array[String]): Unit = {
val url: String = config.getString("db.url")
val user: String = config.getString("db.user")
println(url)
println(user)
}
}
application.conf file :
db {
url = "jdbc:postgresql://localhost:5432/test"
user = "test"
}
I have uploaded the file to the dbfs and using the path to create the job.
Spark submit job json :
{
"new_cluster": {
"spark_version": "6.4.x-esr-scala2.11",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE",
"first_on_demand": 1,
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 1
},
"spark_submit_task": {
"parameters": [
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
},
"email_notifications": {},
"name": "application-conf-test",
"max_concurrent_runs": 1
}
I have used above json to create the spark submit job and tried to run the spark-submit job using datbricks CLI commands.
Error :
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at Bootstrap$.main(Test.scala:16)
at Bootstrap.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
I can see the below line in logs but the file is not getting loaded.
21/09/22 07:21:43 INFO SparkContext: Added file dbfs:/tmp/application.conf at dbfs:/tmp/application.conf with timestamp 1632295303654
21/09/22 07:21:43 INFO Utils: Fetching dbfs:/tmp/application.conf to /local_disk0/spark-20456b30-fddd-42d7-9b23-9e4c0d3c91cd/userFiles-ee199161-6f48-4c47-b1c7-763ce7c0895f/fetchFileTemp4713981355306806616.tmp
Please help me in passing this typesafe config file to the spark-submit job using the appropriate spark submit job parameters.
โ09-28-2021 10:42 AM
Hi @Praveen Kumar Bachuโ ,
There are several limitations for spark-submit tasks:
Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask
โ09-28-2021 05:58 AM
Thank you so much @Kaniz Fatmaโ , I'm looking forward for the answer!
โ09-28-2021 10:42 AM
Hi @Praveen Kumar Bachuโ ,
There are several limitations for spark-submit tasks:
Please check the docs for more information https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobssparksubmittask
โ09-28-2021 10:53 PM
Hi @Jose Gonzalezโ ,
Thanks for the reply and Yes I have gone through all the docs and steps regarding the spark-submit task and my above question is more about the passing config file in the spark-submit task.
Please re-check the above steps and let me know if that helps , if not we will write more details about what we are trying to do and you can tell whether it is possible in databricks or not.
โ10-01-2021 12:49 PM
Hi @Praveen Kumar Bachuโ
The error shows that the job was not able to read your configuration. It means that the only way to pass your configurations is thru submit parameters.
โ10-05-2021 07:58 AM
Hi @Jose Gonzalezโ ,
Please see the below spark-submit json and few more examples we have tried with Spark submit parameters
spark-submit json:
{
"new_cluster": {
"spark_version": "6.4.x-esr-scala2.11",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE",
"first_on_demand": 1,
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"enable_elastic_disk": true,
"num_workers": 1
},
"spark_submit_task": {
"parameters": [
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
},
"email_notifications": {},
"name": "application-conf-test",
"max_concurrent_runs": 1
}
We have tried below spark_submit_task parameters in the above json
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=/tmp/application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=/tmp/",
"--conf",
"spark.executor.extraClassPath=/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/application.conf",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:/tmp/",
"--conf",
"spark.executor.extraClassPath=dbfs:/tmp/",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraClassPath=dbfs:./",
"--conf",
"spark.executor.extraClassPath=dbfs:./",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--driver-java-options",
"-Dconfig.file=application.conf",
"--conf",
"spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
[
"--class",
"Bootstrap",
"--conf",
"spark.driver.extraJavaOptions=-Dconfig.file=application.conf",
"--conf",
"spark.executor.extraJavaOptions=-Dconfig.file=application.conf",
"--files",
"dbfs:/tmp/application.conf",
"dbfs:/tmp/code-assembly-0.1.0.jar"
]
For all the above spark_submit_task parameters , we are facing the same below specified error.
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at Bootstrap.main(Test.scala:16)
at Bootstrap.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
Please can you quickly help on this as we need this implementation
โ03-14-2022 05:38 PM
whats the code inside your main?
โ03-15-2022 01:21 AM
Hi @Praveen Kumar Bachuโ Could you please try the below approach? Let me if this works for you.
import java.io.File
import org.apache.spark.SparkFiles
//use parseFile instead of load
val config: Config = ConfigFactory.parseFile( new File(SparkFiles.get("application.conf")))
note: you will need to pass the file using --files
"--files",
"dbfs:/tmp/application.conf",
โ04-25-2023 06:54 AM
I've experenced similar issues; please help to answer how to get this working;
I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah path
in either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no success.
spark.driver.extraJavaOptions
NOTE: TESTING VIA NOTEBOOK using the extraJavaOptions had no problems. (but we did notice, in the notebook,
below command would not succeed unless we try to ls the parent folders 1 by 1 first.
ls /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
cat /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
see below snippet;
spark_submit_task= {
"parameters": [
"--class", "com.source2sea.glue.GlueMain",
"--conf", f"spark.driver.extraJavaOptions={java_option_d_config_file}",
"--files", conf_path,
jar_full_path, MY-PARAMETERS
]
}
in my scala code i have code like this (use pureConfig, which is a wrapper of typeSafeConfig, ensured this is done: https://pureconfig.github.io/docs/faq.html#how-can-i-use-pureconfig-with-spark-210-problematic-shape...),
val source = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference)
def read(source: ConfigObjectSource): Either[Throwable, AppConfig] = {
implicit def hint[A] = ProductHint[A](ConfigFieldMapping(CamelCase, CamelCase))
logger.debug(s"Loading configuration ${source.config()}")
val original: Either[ConfigReaderFailures, AppConfig] = source.load[AppConfig]
logger.info(s"Loaded and casted configuration ${original}")
original.leftMap[Throwable](ConfigReaderException.apply)
}
error log
23/04/25 13:45:49 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 13:45:49 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf) dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
or
23/04/25 12:46:10 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 12:46:10 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf) /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
at com.source2sea.glue.config.AppConfig$.$anonfun$read$2(AppConfig.scala:31)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group