cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Best Practices for Managing ACLs on Jobs and Job Clusters in Databricks

shweta_m
New Contributor III

 

Hi all,

Iโ€™m setting up access control for Databricks jobs and have two questions:

  1. Ephemeral Job Clusters: Since job clusters are created at runtime, is it best practice to set ACLs on the job itself? The /api/2.0/permissions/clusters/{cluster_id} endpoint requires a cluster ID, but ephemeral clusters donโ€™t exist beforehand.

  2. All Jobs & New Jobs: Whatโ€™s the recommended way to manage ACLs for all existing jobs and automatically apply permissions to newly created jobs?

Looking for scalable, best-practice guidance.

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

juan_maedo
New Contributor II

Hi @shweta_m,

I don't think this is exactly what you're asking, which seems to be some kind of configuration at the account management console level, but I don't know of a way to do what you're proposing.

In my case, we had a similar problem with my organisation. We opted to migrate the implementation of jobs/pipelines through Databricks Asset Bundles, and the Azure EntraID integration provides us directly with the user groups we have in Azure.

Therefore, when developing and deploying in different environments (including dev), we choose the permissions that each group has in the targets section of databricks.yml, so that only selected people can see the jobs/pipelines and with different permissions to view, execute, edit, or manage.

In our case we so differents job cluster configuration so we decide to declare the compute over each job declaration yml.

Now, nobody can run or even see anything they shouldn't.

I attached you some resources:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/permissions
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/settings

View solution in original post

saurabh18cs
Honored Contributor II

Hi @shweta_m 

I do agree with @juan_maedo ..

Just to add on top of it , answer of your solution is automated deployment pipelines embedded into your project repo using databricks asset bundles which is scalable and reliable way to implement permissions dynamically.

we dynamically create instance pools with right configuration defined under asset bundles files and grant 'can manage' permissions to this master SP (this is owned by our team as entire infra depends on this sp, basically owner of RG and part of service connection)

We also deploy databricks job using asset bundles (which use those instance pools to configure a job cluster) and make owner and runas to same master sp but we also add can manage permissions to this team ad group which is being synced from azure entraid to databricks.

As a side note apply tags to these instance pools which will help you to segregate cost per workload basis or domain basis , how you prefer.

Br

 

 

 

View solution in original post

3 REPLIES 3

juan_maedo
New Contributor II

Hi @shweta_m,

I don't think this is exactly what you're asking, which seems to be some kind of configuration at the account management console level, but I don't know of a way to do what you're proposing.

In my case, we had a similar problem with my organisation. We opted to migrate the implementation of jobs/pipelines through Databricks Asset Bundles, and the Azure EntraID integration provides us directly with the user groups we have in Azure.

Therefore, when developing and deploying in different environments (including dev), we choose the permissions that each group has in the targets section of databricks.yml, so that only selected people can see the jobs/pipelines and with different permissions to view, execute, edit, or manage.

In our case we so differents job cluster configuration so we decide to declare the compute over each job declaration yml.

Now, nobody can run or even see anything they shouldn't.

I attached you some resources:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/permissions
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/settings

saurabh18cs
Honored Contributor II

Hi @shweta_m 

I do agree with @juan_maedo ..

Just to add on top of it , answer of your solution is automated deployment pipelines embedded into your project repo using databricks asset bundles which is scalable and reliable way to implement permissions dynamically.

we dynamically create instance pools with right configuration defined under asset bundles files and grant 'can manage' permissions to this master SP (this is owned by our team as entire infra depends on this sp, basically owner of RG and part of service connection)

We also deploy databricks job using asset bundles (which use those instance pools to configure a job cluster) and make owner and runas to same master sp but we also add can manage permissions to this team ad group which is being synced from azure entraid to databricks.

As a side note apply tags to these instance pools which will help you to segregate cost per workload basis or domain basis , how you prefer.

Br

 

 

 

shweta_m
New Contributor III