Scheduling jobs with Airflow result in each task running multiple jobs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2023 02:04 PM
Hey everyone, I'm experiementing with running containerized pyspark jobs in Databricks, and orchestrating them with airflow. I am however, encountering an issue here. When I trigger an airflow DAG, and I look at the logs, I see that airflow is spinning up multiple jobs in its first try.
It's really strange. Attached a screenshot - in this instance, I need the job to run only once, but here 3 jobs are being run.
Wanted to know if this has been encountered before and any fixes, and will send cluster and pool config details upon request.
- Labels:
-
Airflow
-
Docker
-
Multiple Jobs
-
PySpark Jobs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2023 10:57 PM
That's weird behaviour. Can you please share the sample of an Airflow code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2023 01:44 PM
Thanks Daniel. Sure thing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-15-2023 08:39 PM
Hi @Tacuma Solomon , 3 jobs with the same config? or 3 job runs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2023 04:51 AM
Both, I guess? Yes, all jobs share the same config - the question I have is why in the same airflow task log, there are 3 jobs runs. I'm hoping that there's something in the configs and may give me some kind of clue.

