cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Tacuma
by New Contributor II
  • 785 Views
  • 4 replies
  • 1 kudos

Scheduling jobs with Airflow result in each task running multiple jobs.

Hey everyone, I'm experiementing with running containerized pyspark jobs in Databricks, and orchestrating them with airflow. I am however, encountering an issue here. When I trigger an airflow DAG, and I look at the logs, I see that airflow is spinni...

  • 785 Views
  • 4 replies
  • 1 kudos
Latest Reply
Tacuma
New Contributor II
  • 1 kudos

Both, I guess? Yes, all jobs share the same config - the question I have is why in the same airflow task log, there are 3 jobs runs. I'm hoping that there's something in the configs and may give me some kind of clue.

  • 1 kudos
3 More Replies
Choolanadu
by New Contributor
  • 1932 Views
  • 1 replies
  • 0 kudos

Airflow - How to pull XComs value in the notebook task?

Using AIrflow, I have created a DAG with a sequence of notebook tasks. The first notebook returns a batch id; the subsequent notebook tasks need this batch_id.I am using the DatabricksSubmitRunOperator to run the notebook task. This operator pushes ...

  • 1932 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Honored Contributor III
  • 0 kudos

From what I understand - you want to pass a run_id parameter to the second notebook task?You can: Create a widget param inside your databricks notebook (https://docs.databricks.com/notebooks/widgets.html) that will consume your run_idPass the paramet...

  • 0 kudos
arthur_wang
by New Contributor
  • 2449 Views
  • 3 replies
  • 1 kudos

How does Task Orchestration compare to Airflow (for Databricks-only jobs)?

One of my clients has been orchestration Databricks notebooks using Airflow + REST API. They're curious about the pros/cons of switching these jobs to Databricks jobs with Task Orchestration.I know there are all sorts of considerations - for example,...

  • 2449 Views
  • 3 replies
  • 1 kudos
Latest Reply
Shourya
New Contributor III
  • 1 kudos

@Kaniz Fatma​ Hello Kaniz, I'm currently working with a major Enterprise Client looking to make the choice between the Airflow vs Databricks for Jobs scheduling. Our Entire code base is in Databricks and we are trying to figure out the complexities t...

  • 1 kudos
2 More Replies
apw
by New Contributor II
  • 1256 Views
  • 2 replies
  • 2 kudos

Arrow R package fails to install

# Databricks notebook source .libPaths()   # COMMAND ----------   dir("/databricks/spark/R/lib")   # COMMAND ----------   ## Add current working directory to library paths .libPaths(c(getwd(), .libPaths()))   # COMMAND ----------   ## The latest vers...

Arrow Fail Message" data-fileid="0698Y00000JFZosQAH
  • 1256 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Anthony McGrath​ ​, We haven’t heard from you on the last response from @Atanu Sarkar​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Oth...

  • 2 kudos
1 More Replies
MadelynM
by New Contributor III
  • 1800 Views
  • 2 replies
  • 1 kudos

2021-08-Best-Practices-for-Your-Data-Architecture-v3-OG-1200x628

Thanks to everyone who joined the Best Practices for Your Data Architecture session on Getting Workloads to Production using CI/CD. You can access the on-demand session recording here, and the code in the Databricks Labs CI/CD Templates Repo. Posted ...

  • 1800 Views
  • 2 replies
  • 1 kudos
Latest Reply
MadelynM
New Contributor III
  • 1 kudos

Here's the embedded links list!Jobs scheduling and orchestrationBuilt-in job scheduling: https://docs.databricks.com/jobs.html#schedule-a-job Periodic scheduling of the jobsExecute notebook / jar / Python script / Spark-submitMultitask JobsExecute no...

  • 1 kudos
1 More Replies
User16783854657
by New Contributor III
  • 970 Views
  • 1 replies
  • 0 kudos

Resolved! How to ensure that a Databricks Run Submit run invoked from Airflow only runs one time?

I am running jobs on Databricks using the Run Submit API with Airflow. I have noticed that rarely, a particular run is run more than one time at once. Why?

  • 970 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

Idempotency can be ensured by providing the idempotency token. It's easy to pass the same through REST API as mentioned in the below doc:https://kb.databricks.com/jobs/jobs-idempotency.htmlThe primary reason for multiple runs is the client submits t...

  • 0 kudos
Labels