Databricks Community

emanuele_maffeo · ‎03-17-2022

Hi everybody,

Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.

We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to switch to the 3.2.0 spark version (which databricks 10.1 is based on), we cannot compile our code since Trigger.AvailableOnce is not in this release (at least for the spark open source version). Looking into the github repository seems like this functionality will be released with spark 3.3.

Do we have to wait until the spark 3.3 release?

emanuele_maffeo · ‎03-17-2022

That's fair.

Anyway this feature is basically backported from spark 3.3.0, but since spark 3.3.0 has not been released yet I cannot use it because my code won't compile, hence my whole development process won't work.

In the meantime I've found a ugly hack (using reflection) that allow me to avoid this issue:

val clazz   = Class.forName("org.apache.spark.sql.streaming.Trigger")
    val method  = clazz.getMethod("AvailableNow")
    val trigger = method.invoke(null).asInstanceOf[Trigger]
 
    val streamWriter = df.writeStream
      .format("delta")
      .options(config.sparkWriteOptions)
      .trigger(trigger)

Anyway I guess that this is something that needs to be addressed somehow, in the future there may be other backported features where this workaround won't work.

View solution in original post

Anonymous · ‎03-17-2022

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

Anonymous · ‎03-17-2022

Also, it does look like it's available in scala in 10.1 from the release notes

https://docs.databricks.com/release-notes/runtime/10.1.html#triggeravailablenow-for-delta-source-str...

emanuele_maffeo · ‎03-17-2022

Yes, it's available in scala, if I use a scala notebook. But what if I develop my code on a IDE and deploy it to databricks using CD pipelines? is there any chance to have the databricks runtime packaged as jar so that I can use it as a sbt dependency?

Anonymous · ‎03-17-2022

Many things don't work in an IDE such as dbutils and some delta lake features.

We don't release the source code as jars because if we did that AWS would package it and sell it.

emanuele_maffeo · ‎03-17-2022

That's fair.

Anyway this feature is basically backported from spark 3.3.0, but since spark 3.3.0 has not been released yet I cannot use it because my code won't compile, hence my whole development process won't work.

In the meantime I've found a ugly hack (using reflection) that allow me to avoid this issue:

val clazz   = Class.forName("org.apache.spark.sql.streaming.Trigger")
    val method  = clazz.getMethod("AvailableNow")
    val trigger = method.invoke(null).asInstanceOf[Trigger]
 
    val streamWriter = df.writeStream
      .format("delta")
      .options(config.sparkWriteOptions)
      .trigger(trigger)

Anyway I guess that this is something that needs to be addressed somehow, in the future there may be other backported features where this workaround won't work.

Databricks Community

Trigger.AvailableNow on scala - compile issue

Join Us as a Local Community Builder!

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions

🚀 Weekly Delta (24-30 September): A Look Back at This Week’s Top Community Highlights!

Announcing Data Intelligence for Cybersecurity