cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

ismaelhenzel
New Contributor III

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken pipe in production. I would prefer that the CI/CD throws an error in these cases, not mark it as success. Is there a way to do this, like a parameter in a bundle run? If not, should I recreate my pipe in production with "all done" rules, but in development with "all succeeded" to capture the errors in CI/CD? I understand that I should have a QA environment to test these cases, but unfortunately, that's not the case right now.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @ismaelhenzelHandling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Let’s explore some options:

  1. Error Handling in Databricks Asset Bundles:

    • By default, when a task fails in a Databricks bundle run, the pipeline returns a message like “the job xxxx SUCCESS_WITH_FAILURES,” which can be misleading in CI/CD scenarios.
    • Unfortunately, there isn’t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
  2. Approaches to Consider:

    • Custom Exit Codes:
      • You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
      • In your CI/CD workflow, you can then check the exit code of the bundle run. If it’s non-zero, treat it as an error.
      • Example (pseudo-code):
        # Custom script to check task status
        if [ "$TASK_STATUS" != "SUCCESS" ]; then
            echo "Error: Task failed!"
            exit 1  # Non-zero exit code indicates an error
        fi
        
    • Separate Development and Production Pipelines:
      • As you mentioned, consider having separate pipelines for development and production.
      • In the development pipeline, use “all succeeded” rules to capture errors during CI/CD testing.
      • In the production pipeline, use “all done” rules for normal execution.
      • This approach ensures that errors are caught during development but not in production.
  3. QA Environment (Future Consideration):

    • While it’s not feasible right now, having a dedicated QA environment is essential for thorough testing.
    • In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.
 

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @ismaelhenzelHandling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Let’s explore some options:

  1. Error Handling in Databricks Asset Bundles:

    • By default, when a task fails in a Databricks bundle run, the pipeline returns a message like “the job xxxx SUCCESS_WITH_FAILURES,” which can be misleading in CI/CD scenarios.
    • Unfortunately, there isn’t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
  2. Approaches to Consider:

    • Custom Exit Codes:
      • You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
      • In your CI/CD workflow, you can then check the exit code of the bundle run. If it’s non-zero, treat it as an error.
      • Example (pseudo-code):
        # Custom script to check task status
        if [ "$TASK_STATUS" != "SUCCESS" ]; then
            echo "Error: Task failed!"
            exit 1  # Non-zero exit code indicates an error
        fi
        
    • Separate Development and Production Pipelines:
      • As you mentioned, consider having separate pipelines for development and production.
      • In the development pipeline, use “all succeeded” rules to capture errors during CI/CD testing.
      • In the production pipeline, use “all done” rules for normal execution.
      • This approach ensures that errors are caught during development but not in production.
  3. QA Environment (Future Consideration):

    • While it’s not feasible right now, having a dedicated QA environment is essential for thorough testing.
    • In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.
 

ismaelhenzel
New Contributor III

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to implement a QA environment after migrating all cloud resources to Terraform. Thanks!