Greetings @Naveenkumar1811 ,
The compilation error occurs because the Scala API in your build only exposes from_avro(Column, jsonSchemaStr[, options]), not the Databricks-only overload that takes subject and schemaRegistryAddress, so the call with subject/schemaRegistryAddress doesn’t match any available method signatures in your JAR build.
Why it fails
In open-source Spark’s Scala API, org.apache.spark.sql.avro.functions.from_avro has two overloads: from_avro( Column, jsonFormatSchema: String) and from_avro( Column, jsonFormatSchema: String, options: java.util.Map[String,String]), and neither accepts a subject or schema registry URL, which is why the compiler says it “cannot be applied to (data, subject, schemaRegistryAddress, options)”.
Databricks clusters add an extra Scala-friendly overload that lets you specify subject and schemaRegistryAddress directly for Schema Registry integration, but that overload isn’t present in the standard spark-avro artifact you compile against with sbt.
What works in OSS Spark
When compiling outside Databricks against org.apache.spark:spark-avro, fetch the Avro schema JSON yourself (for example via Confluent’s client), then call from_avro with the JSON schema string and optional options like mode, which is exactly what your else branch does.
A typical flow is to use the Schema Registry REST client to getLatestVersion for “<topic>-value”, take getSchema() as a JSON string, optionally broadcast it, and pass it into from_avro(col, jsonSchemaStr[, options]).
If you want Databricks dynamic schema
On Databricks runtimes that support Schema Registry integration, you can use the subject-based overload directly in Scala: from_avro(data = $"value", subject = "t-value", schemaRegistryAddress = "https://...") with optional auth options, but this code must be compiled and resolved against the Databricks-provided libraries present on the cluster.
If you compile the JAR locally against OSS Maven coordinates, that subject-based overload won’t exist at compile time, leading to your exact error even though it would work interactively on a Databricks notebook.
Workarounds when building with sbt
- Keep using the “fixed schema” path: resolve the Avro schema JSON via your CachedSchemaRegistryClient at driver, then call from_avro(col, schemaJson, options), which compiles everywhere.
- If you must keep a single codebase but want subject-based deserialization on Databricks, keep the call site uniform and switch implementations by environment: on Databricks call the subject-based overload using code that only compiles there (for example by isolating it in a Databricks-built module), and off Databricks fall back to fetching the schema JSON first.
- As an alternative to hand-rolling dynamic schema logic, consider libraries that handle Confluent wire format + Schema Registry in Spark, such as ABRiS or Adobe’s spark-avro with Schema Registry support, which can avoid needing Databricks-specific overloads.
Example fixes
- Compile-safe (works in OSS and Databricks): fetch schema JSON and pass it to from_avro with your mode option, reusing your broadcast pattern; this keeps your if/else but uses only the OSS signatures.
- Databricks-only path (subject-based): keep your named-arg call exactly as you wrote, but ensure this code is compiled against and runs on a Databricks runtime that documents the subject/schemaRegistryAddress overload, and pass auth via options like confluent.schema.registry.basic.auth.user.info when needed.
Version and dependency checklist
- If you intend to use from_avro(data, subject, schemaRegistryAddress[, options]), run and compile that code on a Databricks Runtime that documents Schema Registry integration for Scala, otherwise the overload won’t be found by the compiler.
- If you compile with OSS Spark artifacts (spark-avro), the only supported signatures require a JSON schema string; use the Schema Registry client at the driver to obtain that schema before calling from_avro.
Hoping this helps, Louis.