cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

address how to use multiple spark streaming jobs connecting to one job cluster

Jin_Kim
New Contributor II

Hi,

We have a scenario where we need to deploy 15 spark streaming applications on databricks reading from kafka to single Job cluster.

We tried following approach:

1. create job 1 with new job cluster (C1)

2. create job2 pointing to C1

...

3. create job15 pointing to C1

But, the problem here is if the job 1 fails, it is terminating all the other 14 jobs.

One of the options we are thinking is to have a ***** kafka topic with no messages in it and ***** spark streaming job reading from ***** kafka topic (which will never fail 99.99%) which create new job cluster (C1) and rest of the 15 jobs will point to C1. We are assuming Job cluster C1 will never fail 99.99%.

Other solution we have is to create each job cluster for each job (15 Clusters for 15 jobs ) but it is going to kill our operational costs as it is continuous streaming job and some of topics have very less volume.

Could you please advice on how to address this issue.

Thanks

Jin.

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

@Jin Kim​ ,

  • When you set tasks in a job, first put some ***** task and then every streaming as separated task depended on first (see image below how logic will look like) so there will be only one job,
  • Inside every streaming task, use spark.streams.awaitAnyTermination() to monitor it, and when failed to restart - custom logic,
  • redirect fail notifications messages to pagerduty or something to know that job is falling,
  • set maximum one concurrent job run and frequently run, like every 5 mins, so it will automatically run again when something fails.

image.png

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

@Jin Kim​ ,

  • When you set tasks in a job, first put some ***** task and then every streaming as separated task depended on first (see image below how logic will look like) so there will be only one job,
  • Inside every streaming task, use spark.streams.awaitAnyTermination() to monitor it, and when failed to restart - custom logic,
  • redirect fail notifications messages to pagerduty or something to know that job is falling,
  • set maximum one concurrent job run and frequently run, like every 5 mins, so it will automatically run again when something fails.

image.png

Jin_Kim
New Contributor II

@Hubert Dudek​ , thanks a lot for responding.

  1. When we have setup like this, if one tasks fails, it will not terminate the entire job right?
  2. Since, the job is continously running as it is streaming app, is it possible to add new task to the job(while it is running)? We have around 100 kafka topics and each streaming app listens to only 1 topic.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Jin Kim​ , Are you aware of the Workflows with jobs ? Please go through the doc.

Databricks manages the task orchestration, cluster management, monitoring, and error reporting for all of your jobs. You can run your jobs immediately or periodically through an easy-to-use scheduling system. 

ALSO,

Task dependencies

You can define the order of execution of tasks in a job using the Depends on drop-down. You can set this field to one or more tasks in the job.

imageConfiguring task dependencies creates a Directed Acyclic Graph (DAG) of task execution, a common way of representing execution order in job schedulers. For example, consider the following job consisting of four tasks:

image 

  • Task 1 is the root task and does not depend on any other task.
  • Task 2 and Task 3 depend on Task 1 completing first.
  • Finally, Task 4 depends on Task 2 and Task 3 completing successfully.

Databricks runs upstream tasks before running downstream tasks, running as many of them in parallel as possible. The following diagram illustrates the order of processing for these tasks:

image

Kaniz_Fatma
Community Manager
Community Manager

Hi @Jin Kim​, Just a friendly follow-up. Do you still need help, or the above responses help you to find the solution? Please let us know.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group