Databricks Community

Naveenkumar1811 · 3 weeks ago

Hi,

We have scala jar build with sbt which is used in Databricks jobs to readstream data from kafka...

We are enhancing the from_avro function like below...

def deserializeAvro(

topic: String,

client: CachedSchemaRegistryClient,

sc: SparkContext,

avroOpts: Map[String, String] = Map("mode" -> "FAILFAST"))

(df: DataFrame): DataFrame = {

val useDynamic = avroOpts.get("useDynamicSchema").contains("true")

val mode = avroOpts.getOrElse("mode", "FAILFAST")

if (useDynamic) {

val schemaRegistryAddr = avroOpts("schemaRegistryUrl")

val schemaRegistryOptions = Map(

"mode" -> mode,

"confluent.schema.registry.basic.auth.credentials.source" -> "USER_INFO",

"confluent.schema.registry.basic.auth.user.info" ->

s"${avroOpts("schemaRegistryUser")}:${avroOpts("schemaRegistrySecretKey")}"

)

df.withColumn(

"parsedValue",

from_avro(

data = col("fullValue"),

subject = s"$topic-value",

schemaRegistryAddress = schemaRegistryAddr,

options = schemaRegistryOptions.asJava

)

).drop("fullValue")}

else {

val schema = sc.broadcast(client.getLatestSchemaMetadata(s"$topic-value").getSchema())

df.withColumn(

"parsedValue",

from_avro(col("fixedValue"), schema.value, Map("mode" -> mode).asJava)

)

}

Below is the Error ..

[error] C:\Users\njagana\gitcoderepo\mortar\src\main\scala\com\ses\mortar\transformers\BronzeTransformers.scala:84:9: overloaded method value from_avro with alternatives:
[error] (data: org.apache.spark.sql.Column,jsonFormatSchema: String,options: java.util.Map[String,String])org.apache.spark.sql.Column <and>
[error] (data: org.apache.spark.sql.Column,jsonFormatSchema: String)org.apache.spark.sql.Column
[error] cannot be applied to (data: org.apache.spark.sql.Column, subject: String, schemaRegistryAddress: String, options: java.util.Map[String,String])
[error] from_avro(
[error] ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 83 s (01:23), completed 07-Nov-2025, 2:56:48 pm
PS C:\Users\njagana\gitcoderepo\mortar>

Louis_Frolio · 3 weeks ago

Greetings @Naveenkumar1811 ,

The compilation error occurs because the Scala API in your build only exposes from_avro(Column, jsonSchemaStr[, options]), not the Databricks-only overload that takes subject and schemaRegistryAddress, so the call with subject/schemaRegistryAddress doesn’t match any available method signatures in your JAR build.

Why it fails
In open-source Spark’s Scala API, org.apache.spark.sql.avro.functions.from_avro has two overloads: from_avro( Column, jsonFormatSchema: String) and from_avro( Column, jsonFormatSchema: String, options: java.util.Map[String,String]), and neither accepts a subject or schema registry URL, which is why the compiler says it “cannot be applied to (data, subject, schemaRegistryAddress, options)”.
Databricks clusters add an extra Scala-friendly overload that lets you specify subject and schemaRegistryAddress directly for Schema Registry integration, but that overload isn’t present in the standard spark-avro artifact you compile against with sbt.

What works in OSS Spark
When compiling outside Databricks against org.apache.spark:spark-avro, fetch the Avro schema JSON yourself (for example via Confluent’s client), then call from_avro with the JSON schema string and optional options like mode, which is exactly what your else branch does.
A typical flow is to use the Schema Registry REST client to getLatestVersion for “<topic>-value”, take getSchema() as a JSON string, optionally broadcast it, and pass it into from_avro(col, jsonSchemaStr[, options]).

If you want Databricks dynamic schema
On Databricks runtimes that support Schema Registry integration, you can use the subject-based overload directly in Scala: from_avro(data = $"value", subject = "t-value", schemaRegistryAddress = "https://...") with optional auth options, but this code must be compiled and resolved against the Databricks-provided libraries present on the cluster.
If you compile the JAR locally against OSS Maven coordinates, that subject-based overload won’t exist at compile time, leading to your exact error even though it would work interactively on a Databricks notebook.

Workarounds when building with sbt
- Keep using the “fixed schema” path: resolve the Avro schema JSON via your CachedSchemaRegistryClient at driver, then call from_avro(col, schemaJson, options), which compiles everywhere.
- If you must keep a single codebase but want subject-based deserialization on Databricks, keep the call site uniform and switch implementations by environment: on Databricks call the subject-based overload using code that only compiles there (for example by isolating it in a Databricks-built module), and off Databricks fall back to fetching the schema JSON first.
- As an alternative to hand-rolling dynamic schema logic, consider libraries that handle Confluent wire format + Schema Registry in Spark, such as ABRiS or Adobe’s spark-avro with Schema Registry support, which can avoid needing Databricks-specific overloads.

Example fixes
- Compile-safe (works in OSS and Databricks): fetch schema JSON and pass it to from_avro with your mode option, reusing your broadcast pattern; this keeps your if/else but uses only the OSS signatures.
- Databricks-only path (subject-based): keep your named-arg call exactly as you wrote, but ensure this code is compiled against and runs on a Databricks runtime that documents the subject/schemaRegistryAddress overload, and pass auth via options like confluent.schema.registry.basic.auth.user.info when needed.

Version and dependency checklist
- If you intend to use from_avro(data, subject, schemaRegistryAddress[, options]), run and compile that code on a Databricks Runtime that documents Schema Registry integration for Scala, otherwise the overload won’t be found by the compiler.
- If you compile with OSS Spark artifacts (spark-avro), the only supported signatures require a JSON schema string; use the Schema Registry client at the driver to obtain that schema before calling from_avro.

Hoping this helps, Louis.

View solution in original post

Louis_Frolio · 3 weeks ago

Greetings @Naveenkumar1811 ,

The compilation error occurs because the Scala API in your build only exposes from_avro(Column, jsonSchemaStr[, options]), not the Databricks-only overload that takes subject and schemaRegistryAddress, so the call with subject/schemaRegistryAddress doesn’t match any available method signatures in your JAR build.

Why it fails
In open-source Spark’s Scala API, org.apache.spark.sql.avro.functions.from_avro has two overloads: from_avro( Column, jsonFormatSchema: String) and from_avro( Column, jsonFormatSchema: String, options: java.util.Map[String,String]), and neither accepts a subject or schema registry URL, which is why the compiler says it “cannot be applied to (data, subject, schemaRegistryAddress, options)”.
Databricks clusters add an extra Scala-friendly overload that lets you specify subject and schemaRegistryAddress directly for Schema Registry integration, but that overload isn’t present in the standard spark-avro artifact you compile against with sbt.

What works in OSS Spark
When compiling outside Databricks against org.apache.spark:spark-avro, fetch the Avro schema JSON yourself (for example via Confluent’s client), then call from_avro with the JSON schema string and optional options like mode, which is exactly what your else branch does.
A typical flow is to use the Schema Registry REST client to getLatestVersion for “<topic>-value”, take getSchema() as a JSON string, optionally broadcast it, and pass it into from_avro(col, jsonSchemaStr[, options]).

If you want Databricks dynamic schema
On Databricks runtimes that support Schema Registry integration, you can use the subject-based overload directly in Scala: from_avro(data = $"value", subject = "t-value", schemaRegistryAddress = "https://...") with optional auth options, but this code must be compiled and resolved against the Databricks-provided libraries present on the cluster.
If you compile the JAR locally against OSS Maven coordinates, that subject-based overload won’t exist at compile time, leading to your exact error even though it would work interactively on a Databricks notebook.

Workarounds when building with sbt
- Keep using the “fixed schema” path: resolve the Avro schema JSON via your CachedSchemaRegistryClient at driver, then call from_avro(col, schemaJson, options), which compiles everywhere.
- If you must keep a single codebase but want subject-based deserialization on Databricks, keep the call site uniform and switch implementations by environment: on Databricks call the subject-based overload using code that only compiles there (for example by isolating it in a Databricks-built module), and off Databricks fall back to fetching the schema JSON first.
- As an alternative to hand-rolling dynamic schema logic, consider libraries that handle Confluent wire format + Schema Registry in Spark, such as ABRiS or Adobe’s spark-avro with Schema Registry support, which can avoid needing Databricks-specific overloads.

Example fixes
- Compile-safe (works in OSS and Databricks): fetch schema JSON and pass it to from_avro with your mode option, reusing your broadcast pattern; this keeps your if/else but uses only the OSS signatures.
- Databricks-only path (subject-based): keep your named-arg call exactly as you wrote, but ensure this code is compiled against and runs on a Databricks runtime that documents the subject/schemaRegistryAddress overload, and pass auth via options like confluent.schema.registry.basic.auth.user.info when needed.

Version and dependency checklist
- If you intend to use from_avro(data, subject, schemaRegistryAddress[, options]), run and compile that code on a Databricks Runtime that documents Schema Registry integration for Scala, otherwise the overload won’t be found by the compiler.
- If you compile with OSS Spark artifacts (spark-avro), the only supported signatures require a JSON schema string; use the Schema Registry client at the driver to obtain that schema before calling from_avro.

Hoping this helps, Louis.

Naveenkumar1811 · 3 weeks ago

Thanks For the Update Louis...
As we are planning to Sync All our notebook from Scala to Pyspark , we are in process of converting the code. I think Adding the additional dependency of ABRiS or Adobe’s spark-avro with Schema Registry support will take more regression test for us. But anyways as mentioned from_avro functionality has special overloads with Databricks which is not supported outside it was the bottom line i want to confirm

Databricks Community

Compilation Failing with Scala SBT build to be used in Databricks

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples