04-20-2023 07:36 PM
I would like to ask how to implement zero downtime deployment of spark structured streaming in databricks job compute with terraform.
Because we will upgrade spark application code version. But currently we found every deployment will cancel original job and create new one, and it will approximately 5 minutes interrupt.
Based on this scenario, could we have method to achieve zero downtime in deploy new version. If you have any ideas, please share it to me. I will be appreciated it, thank you.
04-27-2023 11:31 AM
@Mars Su :
Yes, in a blue-green deployment scenario, both the blue and green versions of the Spark Structured Streaming job would be running at the same time, with traffic gradually shifted from the blue to the green version.
Regarding the checkpoint location, it is generally recommended to use separate checkpoint locations for each version of the job in order to avoid potential conflicts or data corruption. This is because the checkpoint location stores the state of the streaming query, including the current offset, which is used to resume the query in case of failures or restarts.
To achieve this in Terraform, you can define two separate checkpoint locations for the blue and green versions of the job, and specify them in the
checkpoint_location parameter of the spark_conf block for each job. For example:
# Blue job
resource "databricks_job" "blue_job" {
# ...
new_cluster {
# ...
spark_conf = {
"spark.sql.streaming.checkpointLocation" = "/blue/checkpoints"
}
}
# ...
}
# Green job
resource "databricks_job" "green_job" {
# ...
new_cluster {
# ...
spark_conf = {
"spark.sql.streaming.checkpointLocation" = "/green/checkpoints"
}
}
# ...
}
In this example, the blue job would use the checkpoint location /blue/checkpoints
, while the green job would use /green/checkpoints. Note that you would also need to ensure that any output or intermediate data is written to separate locations for the blue and green versions of the job, to avoid conflicts or data corruption.
04-25-2023 10:22 PM
@Mars Su :
Yes, you can implement zero downtime deployment of Spark Structured Streaming in Databricks job compute using Terraform. One way to achieve this is by using Databricks' "job clusters" feature, which allows you to create a cluster specifically for running a job. Here's how you can implement zero downtime deployment using Terraform:
By following these steps, you can achieve zero downtime deployment of your Spark Structured Streaming job in Databricks using Terraform. Note that you should thoroughly test your new job before switching all traffic to it, to ensure that it is working correctly and does not cause any issues in production.
04-26-2023 12:12 AM
@Suteja Kanuri
Thanks for your reply my questions.
So based on your scenario, we have 2 spark job are running at the same time, right? Like blue/green deployment.
However, i would like to if we want to achieve it. Do we need split checkpoint location of two spark structured streaming and stored independently?
04-27-2023 11:31 AM
@Mars Su :
Yes, in a blue-green deployment scenario, both the blue and green versions of the Spark Structured Streaming job would be running at the same time, with traffic gradually shifted from the blue to the green version.
Regarding the checkpoint location, it is generally recommended to use separate checkpoint locations for each version of the job in order to avoid potential conflicts or data corruption. This is because the checkpoint location stores the state of the streaming query, including the current offset, which is used to resume the query in case of failures or restarts.
To achieve this in Terraform, you can define two separate checkpoint locations for the blue and green versions of the job, and specify them in the
checkpoint_location parameter of the spark_conf block for each job. For example:
# Blue job
resource "databricks_job" "blue_job" {
# ...
new_cluster {
# ...
spark_conf = {
"spark.sql.streaming.checkpointLocation" = "/blue/checkpoints"
}
}
# ...
}
# Green job
resource "databricks_job" "green_job" {
# ...
new_cluster {
# ...
spark_conf = {
"spark.sql.streaming.checkpointLocation" = "/green/checkpoints"
}
}
# ...
}
In this example, the blue job would use the checkpoint location /blue/checkpoints
, while the green job would use /green/checkpoints. Note that you would also need to ensure that any output or intermediate data is written to separate locations for the blue and green versions of the job, to avoid conflicts or data corruption.
04-28-2023 12:15 AM
@Suteja Kanuri
Thank you for reply. That's a great solution and suggestion.
This is a very helpful for me.
04-28-2023 11:04 AM
@Mars Su : Great it helped you! Request you to mark it as the best answer.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group