Our production workspace has several Repos integrated with GitHub. These repos aways point to master and should never be modified manually by a human directly in the workspace as the pulls are triggered by a GitHub Actions workflow. This workflow calls the Databricks Repos API to trigger the pull when the master branch is updated on GitHub.
If there is a notebook that was modified directly in a Databricks Repo, the pull will fail for there are changes that cause a conflict.
For some reason, we have noticed multiple times that notebooks, sometimes dozens of them, suffered minor changes, like the removal of a single empty line. These changes were not made by a human, as our production workspace is restricted. This causes the automatic update to fail. This is happening constantly in the past month or so.
The issue also happens in our development workspace. Users have notified me that some changes were made in their personal repo branches.
What we do now to bypass this problem is to discard all changes in the Repo and run the update workflow again, but this is causing other problems, like when the squad doesn’t notice in time that their changes were not deployed to production.
Examples are attached.
What could this be? How can we fix it and stop the notebooks from being modified?
Hi @jrosend , The issue you are facing is that there are minor changes occurring in the notebooks in your Databricks Repos, which are causing conflicts during the automatic update process triggered by the GitHub Actions workflow. These changes are not made by humans and are causing the update to fail.
To resolve this issue, you can consider the following steps:
1. Identify the source of the minor changes: Since these changes are not made by humans, it is important to identify the source of these modifications. It could be a script or process running in your environment that is inadvertently modifying the notebooks. Investigate any automated processes or scripts that may be interacting with the notebooks.
2. Implement stricter access controls: Ensure that only authorized users have access to modify the notebooks in the Databricks Repos. Review the access permissions and make sure they are properly configured to prevent unauthorized modifications.
3. Monitor and log notebook modifications: Enable logging or monitoring of notebook modifications in your Databricks environment. This will help you track any changes made to the notebooks and identify the source of the modifications.
4. Regularly review and discard unnecessary changes: Periodically review the changes made to the notebooks in the Databricks Repos.
If you identify any unnecessary or unwanted changes, discard them to prevent conflicts during the update process.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!