Use case
Read data from source table using structured spark streaming(Round the clock).Apply transformation logic etc etc and finally merge the dataframe in the target table.If there is any failure during transformation or merge ,databricks job should not fail.
In my notebook i have the below skeleton code(Please ignore the syntax) and i am running the notebook in a job cluster. So whenever there is failure in handlemicrobatch function i am catching it and sending an alert but i am not throwing the error because we don't want to fail the databricks job.But in doing so the checkpoint gets created even though the data was not processed/written to the target table. Is there a way to avoid checkpoint creation in case of failure without throwing the exception in catch block?
class handlerclass(.......){
def handlemicrobatch(df: DataFrame ,batchid: Long){
try{
apply transformation logics...and finally merge to target
}catch { case e: Exception {
log errors
send emails
}
}
}
}
val classobj = new handlerclass(......);
val checkpoint = dbfs:/checkpoint/<sourcetable>
StartStream(){
val df = readstream ..on source table
val streamingquery = df.writestream
.foreachbatch(classobj.handlemicrobatch _).
.option("checkpointLocation",checkpoint)
............
.start()
}