01-11-2023 02:04 PM
Hey everyone, I'm experiementing with running containerized pyspark jobs in Databricks, and orchestrating them with airflow. I am however, encountering an issue here. When I trigger an airflow DAG, and I look at the logs, I see that airflow is spinning up multiple jobs in its first try.
It's really strange. Attached a screenshot - in this instance, I need the job to run only once, but here 3 jobs are being run.
Wanted to know if this has been encountered before and any fixes, and will send cluster and pool config details upon request.
01-11-2023 10:57 PM
That's weird behaviour. Can you please share the sample of an Airflow code?
01-13-2023 01:44 PM
Thanks Daniel. Sure thing.
01-15-2023 08:39 PM
Hi @Tacuma Solomon , 3 jobs with the same config? or 3 job runs?
01-16-2023 04:51 AM
Both, I guess? Yes, all jobs share the same config - the question I have is why in the same airflow task log, there are 3 jobs runs. I'm hoping that there's something in the configs and may give me some kind of clue.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.