cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

ismaelhenzel
New Contributor II

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken pipe in production. I would prefer that the CI/CD throws an error in these cases, not mark it as success. Is there a way to do this, like a parameter in a bundle run? If not, should I recreate my pipe in production with "all done" rules, but in development with "all succeeded" to capture the errors in CI/CD? I understand that I should have a QA environment to test these cases, but unfortunately, that's not the case right now.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @ismaelhenzelHandling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Let’s explore some options:

  1. Error Handling in Databricks Asset Bundles:

    • By default, when a task fails in a Databricks bundle run, the pipeline returns a message like “the job xxxx SUCCESS_WITH_FAILURES,” which can be misleading in CI/CD scenarios.
    • Unfortunately, there isn’t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
  2. Approaches to Consider:

    • Custom Exit Codes:
      • You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
      • In your CI/CD workflow, you can then check the exit code of the bundle run. If it’s non-zero, treat it as an error.
      • Example (pseudo-code):
        # Custom script to check task status
        if [ "$TASK_STATUS" != "SUCCESS" ]; then
            echo "Error: Task failed!"
            exit 1  # Non-zero exit code indicates an error
        fi
        
    • Separate Development and Production Pipelines:
      • As you mentioned, consider having separate pipelines for development and production.
      • In the development pipeline, use “all succeeded” rules to capture errors during CI/CD testing.
      • In the production pipeline, use “all done” rules for normal execution.
      • This approach ensures that errors are caught during development but not in production.
  3. QA Environment (Future Consideration):

    • While it’s not feasible right now, having a dedicated QA environment is essential for thorough testing.
    • In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.
 

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @ismaelhenzelHandling error scenarios in your Databricks asset bundles during CI/CD workflows is crucial to ensure robustness and prevent potentially broken deployments.

Let’s explore some options:

  1. Error Handling in Databricks Asset Bundles:

    • By default, when a task fails in a Databricks bundle run, the pipeline returns a message like “the job xxxx SUCCESS_WITH_FAILURES,” which can be misleading in CI/CD scenarios.
    • Unfortunately, there isn’t a direct parameter in the bundle run to force an error status when tasks fail. However, you can adopt alternative approaches to achieve your desired behavior.
  2. Approaches to Consider:

    • Custom Exit Codes:
      • You can create a custom script or task within your bundle that explicitly checks for task failures. If any task fails, the script can exit with a non-zero exit code.
      • In your CI/CD workflow, you can then check the exit code of the bundle run. If it’s non-zero, treat it as an error.
      • Example (pseudo-code):
        # Custom script to check task status
        if [ "$TASK_STATUS" != "SUCCESS" ]; then
            echo "Error: Task failed!"
            exit 1  # Non-zero exit code indicates an error
        fi
        
    • Separate Development and Production Pipelines:
      • As you mentioned, consider having separate pipelines for development and production.
      • In the development pipeline, use “all succeeded” rules to capture errors during CI/CD testing.
      • In the production pipeline, use “all done” rules for normal execution.
      • This approach ensures that errors are caught during development but not in production.
  3. QA Environment (Future Consideration):

    • While it’s not feasible right now, having a dedicated QA environment is essential for thorough testing.
    • In the long term, aim to set up a QA environment where you can validate pipelines with real-world data and scenarios before deploying to production.
 

ismaelhenzel
New Contributor II

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to implement a QA environment after migrating all cloud resources to Terraform. Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.