cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Suggestion Needed for a Orchestrator/Scheduler to schedule and execute Jobs in an automated way

BkP
Contributor

Hello Friends,

We have an application which extracts dat from various tables in Azure Databricks and we extract it to postgres tables (postgres installed on top of Azure VMs). After extraction we apply transformation on those datasets in postgres tables with the help of spark programs written on Jupiter notebook and load the data to Neo4j graph database (Neo4j installed on Another Azure VM). For now we are doing the extraction through SQL queries and for transformation on Postgres we are leveraging Python(Spark) Programs. As the there are lot of tables (More than 100) and there is dependency , It is not possible to run everything manually. Hence we are looking for a Orchestrator and Scheduler where we can create our job execution workflow and schedule them to run at a particular time frame. Can you please suggest one ? Appreciate in advance. I am attaching the Architecture of the application here in this post.

image

15 REPLIES 15

VaibB
Contributor

You can leverage Airflow, which provides a connector for databricks jobs API, or can use databricks workflow to orchestrate your jobs where you can define several tasks and set dependencies accordingly.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group