Our production workspace has several Repos integrated with GitHub. These repos aways point to master and should never be modified manually by a human directly in the workspace as the pulls are triggered by a GitHub Actions workflow. This workflow calls the Databricks Repos API to trigger the pull when the master branch is updated on GitHub.
https://{databricksDomain}/api/2.0/repos/{repo["id"]}
The problem:
If there is a notebook that was modified directly in a Databricks Repo, the pull will fail for there are changes that cause a conflict.
For some reason, we have noticed multiple times that notebooks, sometimes dozens of them, suffered minor changes, like the removal of a single empty line. These changes were not made by a human, as our production workspace is restricted. This causes the automatic update to fail. This is happening constantly in the past month or so.
The issue also happens in our development workspace. Users have notified me that some changes were made in their personal repo branches.
What we do now to bypass this problem is to discard all changes in the Repo and run the update workflow again, but this is causing other problems, like when the squad doesn’t notice in time that their changes were not deployed to production.
Examples are attached.
What could this be? How can we fix it and stop the notebooks from being modified?