cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Run multiple jobs with different source code at the same time with Databricks asset bundles

curiousoctopus
New Contributor II

Hi,

I am migrating from dbx to databricks asset bundles. Previously with dbx I could work on different features in separate branches and launch jobs without issue of one job overwritting the other. Now with databricks asset bundles it seems like I can't since it's deploying/updating ONE job and running an instance of the latest.

This is what I have in my `databricks.yml` to deploy my job:

resources:
  jobs:
    <my-job>:
      name: my-job-${var.suffix}
      tasks:
        - ...

 I thought I could use a custom variable (here suffix) to create multiple jobs with the feature name as a suffix for example so that everyone working on different features could run their experiments. However it just changed the name of the job previously deployed. I also tried using the custom variable within the key <my-job> but it wasn't allowed.

 

So my question is how can I achieve this? Ultimately I want to be able to work on a different feature than my colleagues and not have to coordinate when I can launch my job to not overwrite theirs.

 

Thanks you.

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @curiousoctopus, Migrating from dbx to Databricks Asset Bundles (DAB) is a significant transition, and I understand your concern about managing multiple jobs for different features.

Let’s explore how you can achieve your goal of working on separate features without overwriting your colleagues’ work.

  1. Understanding Databricks Asset Bundles (DAB):

  2. Job Configuration in DAB:

    • In DAB, you define job configurations in a databricks.yml file.
    • The name field in your job configuration specifies the name of the job.
    • However, using a custom variable (like ${var.suffix}) in the job name won’t create separate jobs; it will only change the name of the existing job.
  3. Creating Multiple Jobs with Unique Names:

    • To achieve your goal, you need to create separate jobs for different features.

    • Instead of using a custom variable within the job name, consider creating a separate job configuration for each feature.

    • For example, you could define multiple job configurations like this:

      resources:
        jobs:
          feature1-job:
            name: my-feature1-job
            tasks:
              - ...
          feature2-job:
            name: my-feature2-job
            tasks:
              - ...
          # Add more job configurations for other features
      
  4. Using Job Parameters:

    • If you want to parameterize your job configurations further, consider using job parameters.

    • Define a job parameter (e.g., feature_name) and use it within your tasks.

    • For example:

      resources:
        jobs:
          feature1-job:
            name: my-feature1-job-${var.suffix}
            tasks:
              - notebook_path: /path/to/feature1-notebook
                base_parameters:
                  feature_name: feature1
          feature2-job:
            name: my-feature2-job-${var.suffix}
            tasks:
              - notebook_path: /path/to/feature2-notebook
                base_parameters:
                  feature_name: feature2
          # Add more job configurations for other features
      
  5. Deploying and Running Jobs:

    • Once you’ve defined separate job configurations, deploy them using DAB.
    • Each job will have a unique name based on the feature it represents.
    • Running these jobs won’t overwrite each other, allowing you to work independently.

Remember that DAB provides a sustainable way to manage jobs on Databricks, and it’s the recommended ...2. Happy coding, and may your features flourish like well-nurtured data gardens! 🌱🚀

Hi Kaniz,

Thank you for your answer and the time taken. Unfortunately this is not an acceptable solution for me as for every feature we would developed we would have to create a new job within the `databricks.yml` file. This is too much of a hassle and ultimately not the goal of CI/CD pipelines.

dbx uses an asset-based approach to allow testing new features without overwriting the current job definition. The use cases mentioned are exactly what we are looking for in dab (also see in their documentation😞

  • You want to update or change job definitions only when you release the job
  • Multiple users working in parallel on the same job (e.g. in CI pipelines)

Does dab offer a similar feature? And if not, is it planned to do so? As this is a considerable issue for my team, we are considering not switching to dab and keep dbx instead.

Thank you.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.