cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

pass application.conf file into databricks jobs

source2sea
Contributor

i copied my question from an very old question/post that i reponded. and decided to move it to here:

context:

  • I have jar (scala), using scala pureconfig (wrapper of typesafe config)
  • uploaded an application.conf file to a path which is mounted to the workspace.
  • i've tested the jar logic via notebook already (works)
  • move to non-notebook approach (this case, airflow submit the api call; either using spark_submit_task or spark_jar_task) both have failrues. see details below.

I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah path

in either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no success.

spark.driver.extraJavaOptions

 NOTE: TESTING VIA NOTEBOOK using the extraJavaOptions had no problems. (but we did notice, in the notebook,

below command would not succeed unless we try to ls the parent folders 1 by 1 first.

ls /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf
cat /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf

  see below snippet that airflow uses;

spark_submit_task= {

"parameters": [

"--class", "com.source2sea.glue.GlueMain",

"--conf", f"spark.driver.extraJavaOptions={java_option_d_config_file}",

"--files", conf_path,

jar_full_path, MY-PARAMETERS

]

}

In my scala code i have code like this (use pureConfig, which is a wrapper of typeSafeConfig, ensured this is done: https://pureconfig.github.io/docs/faq.html#how-can-i-use-pureconfig-with-spark-210-problematic-shape...),

val source = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference)
 
def read(source: ConfigObjectSource): Either[Throwable, AppConfig] = {
 
  implicit def hint[A] = ProductHint[A](ConfigFieldMapping(CamelCase, CamelCase))
 
  logger.debug(s"Loading configuration ${source.config()}")
  
  val original: Either[ConfigReaderFailures, AppConfig] = source.load[AppConfig]
 
  logger.info(s"Loaded and casted configuration ${original}")
 
  original.leftMap[Throwable](ConfigReaderException.apply)
 
}

error log

23/04/25 13:45:49 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 13:45:49 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
  - (dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf) dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: dbfs:/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
 
 
or
 
 
23/04/25 12:46:10 INFO AppConfig$: Loaded and casted configuration Left(ConfigReaderFailures(ThrowableFailure(shaded.com.typesafe.config.ConfigException$IO: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory),Some(ConfigOrigin(/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf)))))
23/04/25 12:46:10 ERROR GlueMain$: Glue failure
pureconfig.error.ConfigReaderException: Cannot convert configuration to a scala.runtime.Nothing$. Failures are:
  - (/dbfs/mnt/glue-artifacts/conf-staging-env/application.conf) /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf: java.io.FileNotFoundException: /dbfs/mnt/glue-artifacts/conf-staging-env/application.conf (No such file or directory).
 
	at com.source2sea.glue.config.AppConfig$.$anonfun$read$2(AppConfig.scala:31)

Please help to answer how to get this working;

1 ACCEPTED SOLUTION

Accepted Solutions

source2sea
Contributor

we had to put the conf in the root folder of the mounted path, and that works.

maybe the mounted storage account being blob instead of adls2 is causing the issues.

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

I haven't tried with spark-submit, but in my notebooks I use the Filestore for this.

val fileConfig = ConfigFactory.parseFile(

  new File("/dbfs/FileStore/NotebookConfig/app.conf"))

(this is Typesafe)

You could also add the conf file as internal resource and pack it with the jar.

But of course only interesting if the conf file does not change a lot, otherwise you would need to build new jars for every change.

source2sea
Contributor

we had to put the conf in the root folder of the mounted path, and that works.

maybe the mounted storage account being blob instead of adls2 is causing the issues.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!