โ01-07-2025 04:06 AM
Hello community,
I'm planning to migrate some logics of Airflow DAGs on Databricks Workflow. But, I was facing out to some doubts that I have in order to migrate (to find the respective) the logic of my actual code from DAGs to Workflow.
There are two points:
โ01-07-2025 12:52 PM
Hi @jeremy98 , you can leverage databricks dbutils.notebook.run() context to dynamically created jobs/tasks.
Refer following link for more https://docs.databricks.com/en/notebooks/notebook-workflows.html
โ01-11-2025 03:48 AM
Hi Hari, thanks for your response! So, are you essentially suggesting moving forward by using notebooks to build these new workflows? In Airflow DAGs, we typically have a main DAG where we define tasks, and within each task, there are functionalities that call methods from Python files in our "custom libraries." Are you suggesting moving all these custom methods from the libraries Iโve defined into simple notebooks?
I want to ask suggestions also to our Databricks Employees like @Alberto_Umana, @Walter_C. Thanks for your help guys.
โ01-11-2025 06:19 AM
Databricks Workflows offers a similar concept to Airflow's dynamic task mapping through the "For each" task type
.expand()
function:.expand()
in Airflow).For example, if you have a list of dates to process, you could set up a "For each" task that iterates over these dates and runs a notebook or Python wheel for each one.
Reference: https://docs.databricks.com/en/jobs/for-each.html
In Databricks Workflows, there isn't a direct equivalent to Airflow's get_current_context()
function. However, you can access similar information through different means:
{{job.run_id}}
gives you the current job run ID{{job.start_time}}
provides the job start timeโ01-11-2025 10:29 AM
Hello,
Thank you for your amazing response! You provided all the information I needed at this moment, and I truly appreciate it.
However, I have a doubt regarding the following example code:
(Old) file.py - Based on Airflow DAGs:
with (...):
@task
def task_1():
...
process_email()
...
@task
def task_2():
...
(New) file.py - Databricks Workflow:
Here, all task definitions are consolidated into a single notebook file, with callable functions that execute the required tasks using the base_parameters passed for example calling function: "task_1", in the other type of task, function: "task_2".
But I think I made a mistake in the settings. Instead of properly splitting the old file.py into individual tasks (where each task corresponds to a separate notebook file), we replicated the entire structure for each task in one notebook.
Now, regarding the process_email method inside task_1:
Additionally, could you clarify which parts of the code are most suitable for inclusion in a wheel package file?
Thank you again for your help!
โ01-11-2025 11:44 AM
Yes, it is a good practice to create a separate task for the process_email
functionality. This will help in modularizing your code and make it easier to manage and debug. Each task should ideally perform a single responsibility, and separating process_email
into its own task aligns with this principle.
The parts of the code that are better to be included in a wheel package are:
process_email
that are used across multiple tasks or notebooks.โ01-12-2025 04:36 AM
Hello, thanks for your explanation, really helpful! Instead, for testing unique tasks etc. Building Unit Test or Integration tests, how to use databricks yml to set this?
โ01-13-2025 06:07 AM
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now