- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2023 07:38 AM
i copied my question from an very old question/post that i reponded. and decided to move it to here:
context:
- I have jar (scala), using scala pureconfig (wrapper of typesafe config)
- uploaded an application.conf file to a path which is mounted to the workspace.
- i've tested the jar logic via notebook already (works)
- move to non-notebook approach (this case, airflow submit the api call; either using spark_submit_task or spark_jar_task) both have failrues. see details below.
I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah path
in either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no success.
spark.driver.extraJavaOptions
NOTE: TESTING VIA NOTEBOOK using the extraJavaOptions had no problems. (but we did notice, in the notebook,
below command would not succeed unless we try to ls the parent folders 1 by 1 first.
ls /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
cat /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
see below snippet that airflow uses;
spark_submit_task= {
"parameters": [
"--class", "com.source2sea.glue.GlueMain",
"--conf", f"spark.driver.extraJavaOptions={java_option_d_config_file}",
"--files", conf_path,
jar_full_path, MY-PARAMETERS
]
}
In my scala code i have code like this (use pureConfig, which is a wrapper of typeSafeConfig, ensured this is done: https://pureconfig.github.io/docs/faq.html#how-can-i-use-pureconfig-with-spark-210-problematic-shape...),
val source = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference)
def read(source: ConfigObjectSource): Either[Throwable, AppConfig] = {
implicit def hint[A] = ProductHint[A](ConfigFieldMapping(CamelCase, CamelCase))
logger.debug(s"Loading configuration ${source.config()}")
val original: Either[ConfigReaderFailures, AppConfig] = source.load[AppConfig]
logger.info(s"Loaded and casted configuration ${original}")
original.leftMap[Throwable](ConfigReaderException.apply)
}
error log
23/04/25 13:45:49 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 13:45:49 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf) dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
or
23/04/25 12:46:10 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 12:46:10 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
- (/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf) /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
at com.source2sea.glue.config.AppConfig$.$anonfun$read$2(AppConfig.scala:31)
Please help to answer how to get this working;
- Labels:
-
Databricks jobs
-
JOBS
-
Scala Code
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-28-2023 06:29 AM
we had to put the conf in the root folder of the mounted path, and that works.
maybe the mounted storage account being blob instead of adls2 is causing the issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2023 07:54 AM
I haven't tried with spark-submit, but in my notebooks I use the Filestore for this.
val fileConfig = ConfigFactory.parseFile(
new File("/dbfs/FileStore/NotebookConfig/app.conf"))
(this is Typesafe)
You could also add the conf file as internal resource and pack it with the jar.
But of course only interesting if the conf file does not change a lot, otherwise you would need to build new jars for every change.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-28-2023 06:29 AM
we had to put the conf in the root folder of the mounted path, and that works.
maybe the mounted storage account being blob instead of adls2 is causing the issues.