cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Many dbutils.notebook.run interations in a workflow -> Failed to checkout Github repository Error

Michael_Galli
Contributor III

Hi all,

I have a workflow that runs one single notebook with dbutils.notebook.run() and different parameters in one long loop.
At some point, I do have random git erros in the notebook run:

com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED: Failed to checkout Git repository: UNAVAILABLE

If I run the workflow again, it might work, or fail at another stage.
Seems that I hit some kind of GitHub API limit in the workspace..
Is there any way or workaround so solve this?

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Michael_Galli, It appears that youโ€™re encountering GitHub-related issues during your notebook runs in Databricks. 

 

Letโ€™s address this step by step:

 

GitHub API Limit:

  • Databricks enforces rate limits for all REST API calls, including those related to Git integration.
  • These limits are set per endpoint and per workspace to ensure fair usage and high availability.
  • Youโ€™ll receive a 429 response status code if your requests exceed the rate limit.
  • To mitigate this, consider optimizing your API calls or spreading them out over time.
  • You can find more details about rate limits in the Databricks REST API reference.

Workspace Repositories:

Workaround Suggestions:

  • Here are some potential workarounds:
    • Retry Mechanism: Implement a retry mechanism in your workflow. If a Git error occurs, wait briefly and then retry the operation.
    • Throttle Requests: Introduce a delay between consecutive Git-related API calls to avoid hitting rate limits.
    • Error Handling: Catch Git-related exceptions and handle them gracefully. You can log the errors, retry, or take alternative actions.
    • Optimize Git Operations: Review your notebook code and identify any unnecessary or redundant Git operations. Minimize the number of Git-related actions if possible.

Remember that Git-related issues can be tricky, but with careful handling and optimization, you can improve the reliability of your workflow. 

 

Good luck! ๐Ÿš€

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Michael_Galli, It appears that youโ€™re encountering GitHub-related issues during your notebook runs in Databricks. 

 

Letโ€™s address this step by step:

 

GitHub API Limit:

  • Databricks enforces rate limits for all REST API calls, including those related to Git integration.
  • These limits are set per endpoint and per workspace to ensure fair usage and high availability.
  • Youโ€™ll receive a 429 response status code if your requests exceed the rate limit.
  • To mitigate this, consider optimizing your API calls or spreading them out over time.
  • You can find more details about rate limits in the Databricks REST API reference.

Workspace Repositories:

Workaround Suggestions:

  • Here are some potential workarounds:
    • Retry Mechanism: Implement a retry mechanism in your workflow. If a Git error occurs, wait briefly and then retry the operation.
    • Throttle Requests: Introduce a delay between consecutive Git-related API calls to avoid hitting rate limits.
    • Error Handling: Catch Git-related exceptions and handle them gracefully. You can log the errors, retry, or take alternative actions.
    • Optimize Git Operations: Review your notebook code and identify any unnecessary or redundant Git operations. Minimize the number of Git-related actions if possible.

Remember that Git-related issues can be tricky, but with careful handling and optimization, you can improve the reliability of your workflow. 

 

Good luck! ๐Ÿš€

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group