10-26-2023 07:35 PM - edited 10-26-2023 07:38 PM
Hello,
I am attempting to configure Autoloader in File Notification mode with Delta Live Tables. I configured an instance profile, but it is not working because I immediately get AWS access denied errors. This is the same issue that is referenced here
After a bunch of debugging, I've determined that the issue is caused by the "Access Mode" being set to "Shared".
To test this, I created a new compute cluster with the correct instance profile and Access Mode set to "No Isolation Shared". I then connected to that computer cluster in a Python Notebook and I had no problem making AWS API calls through boto3.
I then ran this same experiment with another compute cluster, but this time with the Access Mode set to "Shared". I double checked that the instance profile was configured properly. This time when I connected to the compute cluster via a Python Notebook, all of my AWS commands failed with "invalid credentials" errors.
The only difference between the two consecutive tests was the Access Mode.
Therefore, it appears that Delta Live Tables need to be configured with a Cluster Policy containing an Access Mode of "No isolation Shared". However, that does not seem possible.
I've tried to update the `data_security_mode` JSON property when creating a new cluster policy. However, I then am faced with a validation error when refreshing my Delta Live Table which says that the `data_security_mode` key is not supported.
Is there a way to give the Delta Live Tables access to instance profiles?
Thanks so much for your time!
-Jared
10-30-2023 12:16 AM - edited 10-30-2023 12:17 AM
Hi @jaredrohe, In Delta Live Tables, ensuring secure data access is key. Here's how to set up IAM role-based authentication, enhancing your Delta Live Tables experience without compromising security.
1. Create an IAM Role: Craft an IAM role with the needed permissions for Delta Lake tables and AWS S3 resources.
2. Specify the Instance Profile: When creating a new compute cluster in Delta Live Tables, specify the instance profile linked to your IAM role using the Databricks CLI. Here's an example in JSON format:
{
"cluster_name": "<CLUSTER NAME>",
"spark_version": "<SPARK VERSION>",
"node_type_id": "<NODE TYPE ID>",
"aws_attributes": {
"instance_profile_arn": "<INSTANCE PROFILE ARN>"
}
}
3. Refresh Delta tables: With the IAM role connected to your cluster, securely access your Delta tables. Set your source directory and execute commands to refresh your Delta tables.
These steps ensure secure and efficient data operations.
10-31-2023 10:14 AM
Hey @Kaniz_Fatma , thanks for the response.
My original question was not concerning how to configure instance profiles. I was able to do that successfully.
The problem is that when a cluster runs with an instance profile, the access mode is essential for the instance profile to take effect.
If the cluster runs with Access Mode "No isolation shared", then everything works.
However, if the cluster runs with "Shared" (which is the default for Delta Live Tables) then it does not work. The instance profile does not take effect.
Does the problem make sense @Kaniz_Fatma ?
04-30-2024 07:41 AM
Hi, I'm running into the same issue. Was this solved?
05-01-2024 02:16 PM
We are running into same issue, was this resolved ?
05-03-2024 07:32 AM
Unfortunately, I never got this to work.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group