Enforcing developers to use something like a single user cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thursday
Dear all
we have a challenge. Developers create/recreate tables/views in PRD environment by running notebooks on all-purpose clusters where as the same notebooks already exist as jobs. Not sure, why the developers feel comfortable in using all-purpose clusters. The point now is that the objects get created/recreated with individual ids as owner and that is breaking our data flows as well as consumption flows at times, resulting into chaos. Had been created/recreated by a job, as the job runs as a service principal, and we want that service principal to be the owner of the objects in PRD env.
Any ideas on how i can overcome this obstacle? Can we use single user all-purpose clusters (with a service principal) be used by different individuals while invoking notebooks?
Appreciate any thoughts..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thursday
I think the problem lies deeper, in the way you have set up your CI/CD process. No developer should be able to create any views and/or tables directly in PRD. This development work should only take place in DEV. The objects are then tested in Staging/QA and then automatically deployed to PRD. Deployment to Staging/UAT and PRD takes place automatically, for example with asset bundles.
You can find more information about asset bundles here: https://docs.databricks.com/aws/en/dev-tools/bundles/
If this process is set up correctly, you revoke the rights of all developers in PRD and thus protect your environment.
You can also create cluster policies that restrict the creation of clusters in Dev according to your requirements: https://docs.databricks.com/aws/en/admin/clusters/policies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Friday
Hi Stefan, exactly, we have the same. the CI/CD process invokes jobs that run as service principal. So far, so good. But, please note that not all situations would fall under this ideal case. There will be cases wherein I have to recreate 50 views out of 10000 I have. So, then the developer acquires special access, and is expected to run the job with parameters passed to just recreate those 50 views. However, developers resort to all-purpose clusters and run that views recreation notebook and then their id becomes owner of the object

