โ09-06-2024 01:23 PM
When deploying multiple jobs using the `Databricks.yml` file via the asset bundle, the process either overwrites the same job or renames it, instead of creating separate, distinct jobs.
โ09-08-2024 12:04 PM
Hi @sandy311 ,
Testing on production is generally not recommended. The ideal approach is to have separate environments, such as Dev, PreProd, and Prod, which allow for thorough testing before any changes are deployed to production.
Assuming you are deploying your pull request to a "target" environment (which could be production or another environment), here are two strategies you can use:
Strategy 1: Use a "Pre-Target" Environment for Testing
1. Create a Pre-Target Environment: Set up a testing environment (PreProd) that closely mirrors your target environment (Prod or another critical environment).
2. Deploy the Pull Request to Pre-Target: Deploy changes from the pull request to the Pre-Target environment.
3. Run Tests: Execute your job tests in the Pre-Target environment to ensure that the changes work as expected.
4. Deploy to Target if Tests Pass: If the tests are successful, proceed to deploy the changes to the target environment.
Strategy 2: Use a Rollback Mechanism in the Target Environment
1. Deploy to Target: Deploy the changes directly to the target environment.
2. Run Tests on Target: Execute job tests in the target environment.
3. Handle Results:
- If Tests Pass: Keep the deployed changes.
- If Tests Fail: Roll back to the last known good configuration (e.g., main branch, previous release).
These strategies help maintain stability in your production-like environment while ensuring your new code is tested thoroughly before any critical deployment.
In summary you do not want to keep multiple versions of the same job (stable-old and untested-new) on the same environment. The best practice here is to have a separate environment to test your pull request.
โ09-07-2024 11:46 AM
Hi @sandy311 ,
could you share your databricks.yml file?
Are you sure you used unique job ids when defining your jobs?
โ09-08-2024 08:05 AM
My issue is when I update a job with a different name, it's only overriding the existing job instead of creating a new one using asset bundles.
Example:
This is my YAML file, and when I deploy it using asset bundles, it creates a job:
bundle:
name: test-bundle
artifacts:
default:
type: whl
build: poetry build
path: .
resources:
jobs:
wheel-job:
job_clusters:
- job_cluster_key: sample-cluster
new_cluster:
spark_version: 14.3.x-scala2.12
node_type_id: Standard_DS4_v2
driver_node_type_id: Standard_DS4_v2
When I update the job name or bundle name, it still updates the same job by changing its name. However, what I want is to create a new job if a new job or bundle name is provided, instead of overriding the existing one.
โ09-08-2024 09:31 AM
Hi @sandy311
The behavior yo're observing with Databricks asset bundles is expected because asset bundles are designed to update existing jobs when the configuration or content changes. When you use a job like wheel-job, the asset bundle identifies it as the same job and will updateit.
If you want to create a new job without overwriting the existing one, you should define distinct job names in your databricks.yml file. For example, if you wish to retain the old job while creating a new one, you could name the jobs differently, like wheel-job and wheel-job-v2 (or better a meaningful name).
Here's an updated example of how you can define multiple jobs in your databricks.yml file:
bundle:
name: test-bundle
artifacts:
default:
type: whl
build: poetry build
path: .
resources:
jobs:
# Define the original job
wheel-job:
job_clusters:
- job_cluster_key: sample-cluster
new_cluster:
spark_version: 14.3.x-scala2.12
node_type_id: Standard_DS4_v2
driver_node_type_id: Standard_DS4_v2
# Define a new job with a different name to avoid overwriting
wheel-job-v2:
job_clusters:
- job_cluster_key: sample-cluster
new_cluster:
spark_version: 14.3.x-scala2.12
node_type_id: Standard_DS4_v2
driver_node_type_id: Standard_DS4_v2
This approach will allow you to keep your old jobs while also creating new ones as required, using the asset bundles efficiently without conflicts.
โ09-08-2024 11:25 AM
I was expecting the same behavior, and I have already tried the scenario you mentioned. I believe it's due to the asset bundle, and that's acceptable.
I'm trying to run integration tests when a pull request is created. The goal is to run the entire job before merging and deployment to ensure the job works correctly with the new code. I was also attempting to parameterize this process by passing variables when generating the PR so the integration job would run. After the merge, the new code would execute. However, I think we cannot parameterize the databricks.yaml file, and this presents a challenge.
any suggestions from your side? or best practices?
โ09-08-2024 12:04 PM
Hi @sandy311 ,
Testing on production is generally not recommended. The ideal approach is to have separate environments, such as Dev, PreProd, and Prod, which allow for thorough testing before any changes are deployed to production.
Assuming you are deploying your pull request to a "target" environment (which could be production or another environment), here are two strategies you can use:
Strategy 1: Use a "Pre-Target" Environment for Testing
1. Create a Pre-Target Environment: Set up a testing environment (PreProd) that closely mirrors your target environment (Prod or another critical environment).
2. Deploy the Pull Request to Pre-Target: Deploy changes from the pull request to the Pre-Target environment.
3. Run Tests: Execute your job tests in the Pre-Target environment to ensure that the changes work as expected.
4. Deploy to Target if Tests Pass: If the tests are successful, proceed to deploy the changes to the target environment.
Strategy 2: Use a Rollback Mechanism in the Target Environment
1. Deploy to Target: Deploy the changes directly to the target environment.
2. Run Tests on Target: Execute job tests in the target environment.
3. Handle Results:
- If Tests Pass: Keep the deployed changes.
- If Tests Fail: Roll back to the last known good configuration (e.g., main branch, previous release).
These strategies help maintain stability in your production-like environment while ensuring your new code is tested thoroughly before any critical deployment.
In summary you do not want to keep multiple versions of the same job (stable-old and untested-new) on the same environment. The best practice here is to have a separate environment to test your pull request.
โ09-08-2024 12:36 PM
@filipniziolThank you for your valuable feedback. I believe the second approach aligns well with my requirements. I will implement a rollback mechanism in the target environment. Currently, I am performing all these tasks in the development environment.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group