cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

ismaelhenzel
New Contributor III

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken pipe in production. I would prefer that the CI/CD throws an error in these cases, not mark it as success. Is there a way to do this, like a parameter in a bundle run? If not, should I recreate my pipe in production with "all done" rules, but in development with "all succeeded" to capture the errors in CI/CD? I understand that I should have a QA environment to test these cases, but unfortunately, that's not the case right now.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @ismaelhenzelHandling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Letโ€™s explore some options:

  1. Error Handling in Databricks Asset Bundles:

    • By default, when a task fails in a Databricks bundle run, the pipeline returns a message like โ€œthe job xxxx SUCCESS_WITH_FAILURES,โ€ which can be misleading in CI/CD scenarios.
    • Unfortunately, there isnโ€™t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
  2. Approaches to Consider:

    • Custom Exit Codes:
      • You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
      • In your CI/CD workflow, you can then check the exit code of the bundle run. If itโ€™s non-zero, treat it as an error.
      • Example (pseudo-code):
        # Custom script to check task status
        if [ "$TASK_STATUS" != "SUCCESS" ]; then
            echo "Error: Task failed!"
            exit 1  # Non-zero exit code indicates an error
        fi
        
    • Separate Development and Production Pipelines:
      • As you mentioned, consider having separate pipelines for development and production.
      • In the development pipeline, use โ€œall succeededโ€ rules to capture errors during CI/CD testing.
      • In the production pipeline, use โ€œall doneโ€ rules for normal execution.
      • This approach ensures that errors are caught during development but not in production.
  3. QA Environment (Future Consideration):

    • While itโ€™s not feasible right now, having a dedicated QA environment is essential for thorough testing.
    • In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.
 

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @ismaelhenzelHandling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Letโ€™s explore some options:

  1. Error Handling in Databricks Asset Bundles:

    • By default, when a task fails in a Databricks bundle run, the pipeline returns a message like โ€œthe job xxxx SUCCESS_WITH_FAILURES,โ€ which can be misleading in CI/CD scenarios.
    • Unfortunately, there isnโ€™t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
  2. Approaches to Consider:

    • Custom Exit Codes:
      • You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
      • In your CI/CD workflow, you can then check the exit code of the bundle run. If itโ€™s non-zero, treat it as an error.
      • Example (pseudo-code):
        # Custom script to check task status
        if [ "$TASK_STATUS" != "SUCCESS" ]; then
            echo "Error: Task failed!"
            exit 1  # Non-zero exit code indicates an error
        fi
        
    • Separate Development and Production Pipelines:
      • As you mentioned, consider having separate pipelines for development and production.
      • In the development pipeline, use โ€œall succeededโ€ rules to capture errors during CI/CD testing.
      • In the production pipeline, use โ€œall doneโ€ rules for normal execution.
      • This approach ensures that errors are caught during development but not in production.
  3. QA Environment (Future Consideration):

    • While itโ€™s not feasible right now, having a dedicated QA environment is essential for thorough testing.
    • In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.
 

ismaelhenzel
New Contributor III

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to implement a QA environment after migrating all cloud resources to Terraform. Thanks!