Hi,
We have a working Github integration in place for our production workspace which is running 14 different jobs that are scheduled during different intervals, but throughout the entire day.
The issue over the past 3-4 weeks that we have encountered is that consistently, once a week around the weekend, our jobs throw two different errors typically for 1-3 hours throwing these two errors:
Failed to checkout Git repository: PROJECTS_OPERATION_TIMEOUT: Timed out while performing operation. This may be due to a remote repo that is too large or a slow network. We do not recommend having more than 10000 notebooks in a repo.
Failed to checkout Git repository: PERMISSION_DENIED: Could not connect to git server. Make sure the git server is accessible from Databricks. Connecting to a private git server requires additional setup. Please contact your Databricks representative for details.
We can't really find anything about these errors online or in the forum here. We've tried to increase the cluster variable
spark.storage.blockManagerTimeoutIntervalMs
but it didn't solve the issue.
Note!
- Before and after the error period mentioned above everything works perfectly. It's for all our scheduled jobs under workflows.
- Our repository is fairly small so the size isn't the issue, and we have the correct permissions which is illustrated by the working state throughout the rest of the week.
Thanks for any help in solving our issue!