cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

jaredrohe
New Contributor II

Hello,

I am attempting to configure Autoloader in File Notification mode with Delta Live Tables. I configured an instance profile, but it is not working because I immediately get AWS access denied errors. This is the same issue that is referenced here 

After a bunch of debugging, I've determined that the issue is caused by the "Access Mode" being set to "Shared".

To test this, I created a new compute cluster with the correct instance profile and Access Mode set to "No Isolation Shared". I then connected to that computer cluster in a Python Notebook and I had no problem making AWS API calls through boto3.

I then ran this same experiment with another compute cluster, but this time with the Access Mode set to "Shared". I double checked that the instance profile was configured properly. This time when I connected to the compute cluster via a Python Notebook, all of my AWS commands failed with "invalid credentials" errors.

The only difference between the two consecutive tests was the Access Mode.

Therefore, it appears that Delta Live Tables need to be configured with a Cluster Policy containing an Access Mode of "No isolation Shared". However, that does not seem possible.

I've tried to update the `data_security_mode` JSON property when creating a new cluster policy. However, I then am faced with a validation error when refreshing my Delta Live Table which says that the `data_security_mode` key is not supported.


Is there a way to give the Delta Live Tables access to instance profiles?

Thanks so much for your time!
-Jared

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @jaredrohe, In Delta Live Tables, ensuring secure data access is key. Here's how to set up IAM role-based authentication, enhancing your Delta Live Tables experience without compromising security.

1. Create an IAM Role: Craft an IAM role with the needed permissions for Delta Lake tables and AWS S3 resources.

2. Specify the Instance Profile: When creating a new compute cluster in Delta Live Tables, specify the instance profile linked to your IAM role using the Databricks CLI. Here's an example in JSON format:

{
  "cluster_name": "<CLUSTER NAME>",
  "spark_version": "<SPARK VERSION>",
  "node_type_id": "<NODE TYPE ID>",
  "aws_attributes": {
    "instance_profile_arn": "<INSTANCE PROFILE ARN>"
  }
}

3. Refresh Delta tables: With the IAM role connected to your cluster, securely access your Delta tables. Set your source directory and execute commands to refresh your Delta tables.

These steps ensure secure and efficient data operations. 

jaredrohe
New Contributor II

Hey @Kaniz , thanks for the response. 

My original question was not concerning how to configure instance profiles.  I was able to do that successfully. 

The problem is that when a cluster runs with an instance profile, the access mode is essential for the instance profile to take effect.

If the cluster runs with Access Mode "No isolation shared", then everything works.

However, if the cluster runs with "Shared" (which is the default for Delta Live Tables) then it does not work.  The instance profile does not take effect.

Does the problem make sense @Kaniz ?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.