Hi All,
Yesterday (05.07.2023) databricks experienced a near total outage in the West and North Europe regions.
It took the full working day, but they posted an update in the late afternoonstating that a workaround has been put in place and service is back to 100%.
Unfortunatley there are now some issues. My clusters and notebooks are accessible again, and they start. But a lot of python libs are failing to install now due to "user is not the owner of the resource" - although I am.
Additional issues are being seen, such as all my drivers are messed up and not installed properly anymore. I've tried a bunch of things, for example:
- Reconfigure dpkg Database
- Force-Install the Software
- Remove Bad Software Package
- Clean Out Unused Software Packages
- Remove Post Files
Basically, service is back but everything is messed up with my libs and drivers on every cluster. (It was all fine before the outage).
I am going to try anf clone the cluster, run a script to install the libs on the new cluster and see if it works, and if it works, do it for the other clusters.
I want to ask if anyone else is having this issue, and if theres a better way of resolving this. I also don't want to have to do this whenever an outage happens.
The outage only affected Databricks in Azure.