2 weeks ago
Hey,
I am using DAB's to deploy the job below.
This code works but I would like to use it for other suppliers as well.
Is there a way to loop over a loop of suppliers: ['nike', 'adidas',...] and fill those variables so that
bundle:
name: gsheet-config-jobs
resources:
jobs:
config_nike_gsheet_to_databricks:
name: config_nike_gsheet_to_databricks
tasks:
- task_key: test_config_nike_gsheet_to_databricks
spark_python_task:
python_file: src/mapping-update/main.py
parameters:
- ACC
- NIKE
source: GIT
environment_key: Default
git_source:
git_url: https://github.com/Transfo-Energy/transfo-engine.git
git_provider: gitHub
git_branch: Acceptance
queue:
enabled: true
environments:
- environment_key: Default
spec:
client: "1"
dependencies:
- gspread
- oauth2client
performance_target: PERFORMANCE_OPTIMIZED
Thanks a lot for the help!
2 weeks ago
Oh, now I get it. I've misunderstood your question initially. So in your case you need to build your DAB definition dynamically. You can use Python for Databricks Assets Bundles and dynamically create jobs or pipelines using metadata
2 weeks ago
Hi @Daan ,
Yes, I think you can do it using for-each task. Check below materials, they should help you get started how to implement such scenario.
Use a For each task to run another task in a loop | Databricks Documentation
Databricks SQL Orchestration Patterns with For Each and Dynamic Value References | by Databricks SQL...
2 weeks ago - last edited 2 weeks ago
Hey szymon,
Your answer is to use a for each loop inside a job/workflow.
What I would like to achieve is to create multiple jobs using one Databricks Asset Bundle by iterating over a list of suppliers.
2 weeks ago
Oh, now I get it. I've misunderstood your question initially. So in your case you need to build your DAB definition dynamically. You can use Python for Databricks Assets Bundles and dynamically create jobs or pipelines using metadata
2 weeks ago
2 weeks ago
Hi @Daan ,
Your requirement is to create jobs dynamically by iterating through a list of suppliers. This is definitely achievable using the Databricks SDK.
Iโd recommend providing the job parameters and definitions in a JSON format, as itโs more reliable for parsing. You can structure the job name with an identifier, for example:
```config_{{source}}_gsheet_to_databricks```
Then, within your loop, you can safely replace {{source}} with each supplier value from the iterator and create the jobs dynamically.
2 weeks ago
@Daan ,
You can maintain a default template that holds the common configuration. While creating a job-specific configuration, you can safely merge your job-specific dictionary with the base template using the | operator.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now