Databricks

ismaelhenzel · ‎04-08-2024

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken pipe in production. I would prefer that the CI/CD throws an error in these cases, not mark it as success. Is there a way to do this, like a parameter in a bundle run? If not, should I recreate my pipe in production with "all done" rules, but in development with "all succeeded" to capture the errors in CI/CD? I understand that I should have a QA environment to test these cases, but unfortunately, that's not the case right now.

Kaniz · ‎04-09-2024

Hi @ismaelhenzel, Handling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Let’s explore some options:

Error Handling in Databricks Asset Bundles:
- By default, when a task fails in a Databricks bundle run, the pipeline returns a message like “the job xxxx SUCCESS_WITH_FAILURES,” which can be misleading in CI/CD scenarios.
- Unfortunately, there isn’t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
Approaches to Consider:
- Custom Exit Codes:
  - You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
  - In your CI/CD workflow, you can then check the exit code of the bundle run. If it’s non-zero, treat it as an error.
  - Example (pseudo-code):
```
# Custom script to check task status
if [ "$TASK_STATUS" != "SUCCESS" ]; then
    echo "Error: Task failed!"
    exit 1  # Non-zero exit code indicates an error
fi
```
- Separate Development and Production Pipelines:
  - As you mentioned, consider having separate pipelines for development and production.
  - In the development pipeline, use “all succeeded” rules to capture errors during CI/CD testing.
  - In the production pipeline, use “all done” rules for normal execution.
  - This approach ensures that errors are caught during development but not in production.
QA Environment (Future Consideration):
- While it’s not feasible right now, having a dedicated QA environment is essential for thorough testing.
- In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.

View solution in original post

Kaniz · ‎04-09-2024

Hi @ismaelhenzel, Handling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Let’s explore some options:

Error Handling in Databricks Asset Bundles:
- By default, when a task fails in a Databricks bundle run, the pipeline returns a message like “the job xxxx SUCCESS_WITH_FAILURES,” which can be misleading in CI/CD scenarios.
- Unfortunately, there isn’t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
Approaches to Consider:
- Custom Exit Codes:
  - You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
  - In your CI/CD workflow, you can then check the exit code of the bundle run. If it’s non-zero, treat it as an error.
  - Example (pseudo-code):
```
# Custom script to check task status
if [ "$TASK_STATUS" != "SUCCESS" ]; then
    echo "Error: Task failed!"
    exit 1  # Non-zero exit code indicates an error
fi
```
- Separate Development and Production Pipelines:
  - As you mentioned, consider having separate pipelines for development and production.
  - In the development pipeline, use “all succeeded” rules to capture errors during CI/CD testing.
  - In the production pipeline, use “all done” rules for normal execution.
  - This approach ensures that errors are caught during development but not in production.
QA Environment (Future Consideration):
- While it’s not feasible right now, having a dedicated QA environment is essential for thorough testing.
- In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.

ismaelhenzel · 4 weeks ago

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to implement a QA environment after migrating all cloud resources to Terraform. Thanks!