03-17-2022 07:55 AM
Hi everybody,
Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.
We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to switch to the 3.2.0 spark version (which databricks 10.1 is based on), we cannot compile our code since Trigger.AvailableOnce is not in this release (at least for the spark open source version). Looking into the github repository seems like this functionality will be released with spark 3.3.
Do we have to wait until the spark 3.3 release?
03-17-2022 01:36 PM
That's fair.
Anyway this feature is basically backported from spark 3.3.0, but since spark 3.3.0 has not been released yet I cannot use it because my code won't compile, hence my whole development process won't work.
In the meantime I've found a ugly hack (using reflection) that allow me to avoid this issue:
val clazz = Class.forName("org.apache.spark.sql.streaming.Trigger")
val method = clazz.getMethod("AvailableNow")
val trigger = method.invoke(null).asInstanceOf[Trigger]
val streamWriter = df.writeStream
.format("delta")
.options(config.sparkWriteOptions)
.trigger(trigger)
Anyway I guess that this is something that needs to be addressed somehow, in the future there may be other backported features where this workaround won't work.
03-17-2022 11:53 AM
You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.
03-17-2022 11:56 AM
Also, it does look like it's available in scala in 10.1 from the release notes
03-17-2022 01:10 PM
Yes, it's available in scala, if I use a scala notebook. But what if I develop my code on a IDE and deploy it to databricks using CD pipelines? is there any chance to have the databricks runtime packaged as jar so that I can use it as a sbt dependency?
03-17-2022 01:25 PM
Many things don't work in an IDE such as dbutils and some delta lake features.
We don't release the source code as jars because if we did that AWS would package it and sell it.
03-17-2022 01:36 PM
That's fair.
Anyway this feature is basically backported from spark 3.3.0, but since spark 3.3.0 has not been released yet I cannot use it because my code won't compile, hence my whole development process won't work.
In the meantime I've found a ugly hack (using reflection) that allow me to avoid this issue:
val clazz = Class.forName("org.apache.spark.sql.streaming.Trigger")
val method = clazz.getMethod("AvailableNow")
val trigger = method.invoke(null).asInstanceOf[Trigger]
val streamWriter = df.writeStream
.format("delta")
.options(config.sparkWriteOptions)
.trigger(trigger)
Anyway I guess that this is something that needs to be addressed somehow, in the future there may be other backported features where this workaround won't work.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group