Parallelize spark jobs on the same cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 09:51 AM
Whats the best way to parallelize multiple spark jobs on the same cluster during a backfill?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 11:45 AM
In the past I used direct multi-threaded orchestration inside of driver applications, but that was prior to Databricks supporting multi-task jobs.
If you create a job through the workflows tab, you can set up multiple notebooks, python, or jar tasks to run in parallel as well as configure a dependency graph between them if desired.
You can either orchestrate those jobs via separate clusters in a single job or share the resources of one or more clusters across different tasks.
You can do this without having to assign a schedule to the job.
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-05-2022 10:30 AM
Hi @Oliver Lewis,
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.