cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Compilation Failing with Scala SBT build to be used in Databricks

Naveenkumar1811
New Contributor

Hi,

We have scala jar build with sbt which is used in Databricks jobs to readstream data from kafka...

We are enhancing the from_avro function like below...

 

def deserializeAvro(
    topic: String,
    client: CachedSchemaRegistryClient,
    sc: SparkContext,
    avroOpts: Map[String, String] = Map("mode" -> "FAILFAST"))
    (df: DataFrame): DataFrame = {
      val useDynamic = avroOpts.get("useDynamicSchema").contains("true")
      val mode = avroOpts.getOrElse("mode", "FAILFAST")
      if (useDynamic) {
        val schemaRegistryAddr = avroOpts("schemaRegistryUrl")
        val schemaRegistryOptions = Map(
          "mode" -> mode,
          "confluent.schema.registry.basic.auth.credentials.source" -> "USER_INFO",
          "confluent.schema.registry.basic.auth.user.info" ->
            s"${avroOpts("schemaRegistryUser")}:${avroOpts("schemaRegistrySecretKey")}"
        )
      df.withColumn(
        "parsedValue",
        from_avro(
          data = col("fullValue"),
          subject = s"$topic-value",
          schemaRegistryAddress = schemaRegistryAddr,
          options = schemaRegistryOptions.asJava
        )
      ).drop("fullValue")}
    else {
    val schema = sc.broadcast(client.getLatestSchemaMetadata(s"$topic-value").getSchema())
    df.withColumn(
      "parsedValue",
      from_avro(col("fixedValue"), schema.value, Map("mode" -> mode).asJava)
    )
  }
}
 
Below is the Error ..

[error] C:\Users\njagana\gitcoderepo\mortar\src\main\scala\com\ses\mortar\transformers\BronzeTransformers.scala:84:9: overloaded method value from_avro with alternatives:
[error] (data: org.apache.spark.sql.Column,jsonFormatSchema: String,options: java.util.Map[String,String])org.apache.spark.sql.Column <and>
[error] (data: org.apache.spark.sql.Column,jsonFormatSchema: String)org.apache.spark.sql.Column
[error] cannot be applied to (data: org.apache.spark.sql.Column, subject: String, schemaRegistryAddress: String, options: java.util.Map[String,String])
[error] from_avro(
[error] ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 83 s (01:23), completed 07-Nov-2025, 2:56:48 pm
PS C:\Users\njagana\gitcoderepo\mortar>
1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @Naveenkumar1811 , 

The compilation error occurs because the Scala API in your build only exposes from_avro(Column, jsonSchemaStr[, options]), not the Databricks-only overload that takes subject and schemaRegistryAddress, so the call with subject/schemaRegistryAddress doesn’t match any available method signatures in your JAR build.

Why it fails
In open-source Spark’s Scala API, org.apache.spark.sql.avro.functions.from_avro has two overloads: from_avro( Column, jsonFormatSchema: String) and from_avro( Column, jsonFormatSchema: String, options: java.util.Map[String,String]), and neither accepts a subject or schema registry URL, which is why the compiler says it “cannot be applied to (data, subject, schemaRegistryAddress, options)”. 
Databricks clusters add an extra Scala-friendly overload that lets you specify subject and schemaRegistryAddress directly for Schema Registry integration, but that overload isn’t present in the standard spark-avro artifact you compile against with sbt.

What works in OSS Spark
When compiling outside Databricks against org.apache.spark:spark-avro, fetch the Avro schema JSON yourself (for example via Confluent’s client), then call from_avro with the JSON schema string and optional options like mode, which is exactly what your else branch does.
A typical flow is to use the Schema Registry REST client to getLatestVersion for “<topic>-value”, take getSchema() as a JSON string, optionally broadcast it, and pass it into from_avro(col, jsonSchemaStr[, options]). 

If you want Databricks dynamic schema
On Databricks runtimes that support Schema Registry integration, you can use the subject-based overload directly in Scala: from_avro(data = $"value", subject = "t-value", schemaRegistryAddress = "https://...") with optional auth options, but this code must be compiled and resolved against the Databricks-provided libraries present on the cluster.
If you compile the JAR locally against OSS Maven coordinates, that subject-based overload won’t exist at compile time, leading to your exact error even though it would work interactively on a Databricks notebook.

Workarounds when building with sbt
- Keep using the “fixed schema” path: resolve the Avro schema JSON via your CachedSchemaRegistryClient at driver, then call from_avro(col, schemaJson, options), which compiles everywhere.
- If you must keep a single codebase but want subject-based deserialization on Databricks, keep the call site uniform and switch implementations by environment: on Databricks call the subject-based overload using code that only compiles there (for example by isolating it in a Databricks-built module), and off Databricks fall back to fetching the schema JSON first. 
- As an alternative to hand-rolling dynamic schema logic, consider libraries that handle Confluent wire format + Schema Registry in Spark, such as ABRiS or Adobe’s spark-avro with Schema Registry support, which can avoid needing Databricks-specific overloads.

Example fixes
- Compile-safe (works in OSS and Databricks): fetch schema JSON and pass it to from_avro with your mode option, reusing your broadcast pattern; this keeps your if/else but uses only the OSS signatures.
- Databricks-only path (subject-based): keep your named-arg call exactly as you wrote, but ensure this code is compiled against and runs on a Databricks runtime that documents the subject/schemaRegistryAddress overload, and pass auth via options like confluent.schema.registry.basic.auth.user.info when needed.

Version and dependency checklist
- If you intend to use from_avro(data, subject, schemaRegistryAddress[, options]), run and compile that code on a Databricks Runtime that documents Schema Registry integration for Scala, otherwise the overload won’t be found by the compiler.
- If you compile with OSS Spark artifacts (spark-avro), the only supported signatures require a JSON schema string; use the Schema Registry client at the driver to obtain that schema before calling from_avro.

Hoping this helps, Louis.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now