cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Compute pool and AWS instance profiles

alxsbn
New Contributor III

Hi everyone,

We're looking at using the compute pool feature. Now we're mostly relying on all-purpose and job compute. On these two we're using instance profiles to let the clusters access our s3 buckets and more.

We don't see anything related to instance profiles on Compute pool. Is that normal ? 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @alxsbn , Let’s delve into the details of compute pools and instance profiles.

  1. Compute Pools:

    • Compute pools in Databricks allow you to manage and allocate compute resources efficiently. They provide a way to organize and share compute resources across different workloads.
    • When you create a compute pool, you define its configuration, such as the instance type, minimum and maximum number of instances, and auto-scaling rules.
    • Compute pools are particularly useful for scenarios where you want to allocate resources dynamically based on workload requirements.
  2. Instance Profiles:

    • Instance profiles play a crucial role in granting permissions to Amazon EC2 instances (which Databricks clusters run on) to access other AWS services.
    • By associating an instance profile with an EC2 instance, you grant it specific permissions to interact with services like Amazon S3.
    • For example, if your Databricks clusters need to read or write data from S3 buckets, you’d configure an instance profile with the necessary S3 permissions and attach it to the EC2 instances running your clusters.
  3. Compute Pools and Instance Profiles:

    • Now, let’s address your concern about instance profiles and compute pools.
    • Compute pools themselves do not directly use instance profiles. Instead, they inherit the permissions of the underlying EC2 instances.
    • When you create a compute pool, it leverages the instance profiles associated with the EC2 instances within that pool.
    • Therefore, you don’t explicitly configure instance profiles for compute pools; rather, you manage them at the EC2 instance level.
    • If your existing all-purpose and job compute clusters are already using instance profiles to access S3 buckets, the same permissions will apply when you use compute pools.
  4. Checking Instance Profiles in Compute Pools:

    • It’s normal not to see explicit references to instance profiles within the compute pool configuration.
    • To verify that your compute pools are indeed using the correct instance profiles, follow these steps:
      • Inspect the EC2 instances associated with your compute pools.
      • Confirm that the instance profiles attached to those EC2 instances grant the necessary S3 permissions.
      • If everything aligns, your compute pools will seamlessly inherit those permissions.

In summary, while compute pools themselves don’t have direct instance profile settings, they rely on the permissions granted to their underlying EC2 instances. As long as your existing clusters are correctly configured with instance profiles, your compute pools should function as expected. 🚀

For more detailed information, you can refer to the Databricks documentation on compute pools.12

 

View solution in original post

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @alxsbn , Let’s delve into the details of compute pools and instance profiles.

  1. Compute Pools:

    • Compute pools in Databricks allow you to manage and allocate compute resources efficiently. They provide a way to organize and share compute resources across different workloads.
    • When you create a compute pool, you define its configuration, such as the instance type, minimum and maximum number of instances, and auto-scaling rules.
    • Compute pools are particularly useful for scenarios where you want to allocate resources dynamically based on workload requirements.
  2. Instance Profiles:

    • Instance profiles play a crucial role in granting permissions to Amazon EC2 instances (which Databricks clusters run on) to access other AWS services.
    • By associating an instance profile with an EC2 instance, you grant it specific permissions to interact with services like Amazon S3.
    • For example, if your Databricks clusters need to read or write data from S3 buckets, you’d configure an instance profile with the necessary S3 permissions and attach it to the EC2 instances running your clusters.
  3. Compute Pools and Instance Profiles:

    • Now, let’s address your concern about instance profiles and compute pools.
    • Compute pools themselves do not directly use instance profiles. Instead, they inherit the permissions of the underlying EC2 instances.
    • When you create a compute pool, it leverages the instance profiles associated with the EC2 instances within that pool.
    • Therefore, you don’t explicitly configure instance profiles for compute pools; rather, you manage them at the EC2 instance level.
    • If your existing all-purpose and job compute clusters are already using instance profiles to access S3 buckets, the same permissions will apply when you use compute pools.
  4. Checking Instance Profiles in Compute Pools:

    • It’s normal not to see explicit references to instance profiles within the compute pool configuration.
    • To verify that your compute pools are indeed using the correct instance profiles, follow these steps:
      • Inspect the EC2 instances associated with your compute pools.
      • Confirm that the instance profiles attached to those EC2 instances grant the necessary S3 permissions.
      • If everything aligns, your compute pools will seamlessly inherit those permissions.

In summary, while compute pools themselves don’t have direct instance profile settings, they rely on the permissions granted to their underlying EC2 instances. As long as your existing clusters are correctly configured with instance profiles, your compute pools should function as expected. 🚀

For more detailed information, you can refer to the Databricks documentation on compute pools.12

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.