11-28-2023 06:05 PM - edited 11-28-2023 06:10 PM
We create storage credentials using Terraform. I don't see any way to specify a given External ID (DBR Account ID) when creating the credentials via Terraform or in the web UI console. However, today when I tried creating a new set of credentials using the same custom Terraform module we've used for every other storage credential, the external ID assigned to the credentials is different from all of our other storage credentials.
As a result, I'm unable to use the storage credentials. I get forbidden errors from S3 whenever I try to create a new external location using our terraform module. I am completely blocked from adding new sources to our production unity catalog using our automation. NOTE: this does not happen if I manually create credentials using the web UI.
While I could incorporate the "new" external id into my IAM policies, there is no way to get it from the Terraform resource attributes after it has been created. Also, the IAM role must be created before the storage credential. This seems like a bug, but I can't find any other instance of this happening elsewhere.
UPDATE: It appears the External ID of the storage credential is now different each time I re-create it (destroy then create).
01-04-2024 10:15 AM - edited 01-04-2024 10:18 AM
I was able to get some help from Databricks support and finally confirmed the cause of the behavior: IAM policies for storage credentials now require an external ID (Nov 30, 2023). We have been using external ID since April '23, but this change broke us for a reason not mentioned in the documentation: external ID will be different depending upon the role of the user who creates the storage credential:
The terraform storage credential resource does not return external ID so it's currently impossible to use terraform for creating storage credentials with anything other than an account admin role. The solution is we will have to use an account admin instead of just a workspace admin to create the storage credential resource. We might be able to create an additional databricks provider instance that is solely for the purpose of creating storage credentials so it doesn't change how we create our other resources. This won't be a big effort to change this since we are already using two providers: one for the Databricks Account API and one for the Databricks workspace API - we can just pass the account credentials we are already using to another provider (or replace the ones we're currently using for the workspace API)
ASIDE: I was worried that the random external ID values represented Databricks account IDs for other accounts. It was confirmed by support that this is not the case - they really are just random values and not anything else so there's no security issue created here.
11-29-2023 08:35 AM - edited 11-29-2023 08:36 AM
UPDATE: I was able to manually work around the issue.
While this is workable, I don't consider it a permanent solution and I would still appreciate a solution that will remove the need for a manual work around.
12-08-2023 02:11 AM
I am having a similar problem. So we have one storage credential and added the permission for a new bucket to the IAM role. When deploying with Terraform, we get this error:
Error: cannot create external location: AWS IAM role does not have READ permissions on url s3://...
When deploying with it via UI, it can't validate the read permissions, but the creation works. Also, I created an external table and was able to query it. Is there some caching issue or why is it unable to pick up the changed permissions?
01-04-2024 09:49 AM - edited 01-04-2024 09:51 AM
@Gozrehmnot clear if this is the same thing, but I'm thinking you're experiencing a different issue. In my experience, it is common to see race conditions where Databricks doesn't see an expected change right away. I believe this is because the AWS API is often eventually consistent. In this kind of scenario, I'll often add a time_sleep resource to my terraform and use the `create_duration` attribute to handle this kind of scenario.
Another option is to use the `skip_validation` attribute of the `databricks_external_location` resource. This should work without having to introduce time_sleep, but just know that if there is any issues with your IAM policy you will see them crop up at run time instead of during deployment so make sure you account for that.
01-12-2024 06:25 AM
Were you able to solve our issue? If so, how?
01-17-2024 02:53 PM
Configure your provider with credentials that have been granted the account admin role. This solution means every storage credential created by this provider will have the Databricks Account ID as the external id value.
If that doesn't work for you then the only other solution is to make the process a two-step workflow so you can look up the external id (confirmed: it's just a random UUID if created by a user without the account admin role).
01-04-2024 10:15 AM - edited 01-04-2024 10:18 AM
I was able to get some help from Databricks support and finally confirmed the cause of the behavior: IAM policies for storage credentials now require an external ID (Nov 30, 2023). We have been using external ID since April '23, but this change broke us for a reason not mentioned in the documentation: external ID will be different depending upon the role of the user who creates the storage credential:
The terraform storage credential resource does not return external ID so it's currently impossible to use terraform for creating storage credentials with anything other than an account admin role. The solution is we will have to use an account admin instead of just a workspace admin to create the storage credential resource. We might be able to create an additional databricks provider instance that is solely for the purpose of creating storage credentials so it doesn't change how we create our other resources. This won't be a big effort to change this since we are already using two providers: one for the Databricks Account API and one for the Databricks workspace API - we can just pass the account credentials we are already using to another provider (or replace the ones we're currently using for the workspace API)
ASIDE: I was worried that the random external ID values represented Databricks account IDs for other accounts. It was confirmed by support that this is not the case - they really are just random values and not anything else so there's no security issue created here.
01-12-2024 05:57 AM
I tried the proposed solution using an account provider like this
provider "databricks" {
account_id = "ACCOUNT_ID"
host = "https://accounts.cloud.databricks.com"
}
for creating the storage credential. However, that did not work. I got an exception:
Error: cannot create storage credential: No API found for 'POST /accounts/ACCOUNT_ID/metastores/storage-credentials'
│
│ with databricks_storage_credential.external,
│ on external-storage.tf line 3, in resource "databricks_storage_credential" "external":
│ 3: resource "databricks_storage_credential" "external" {
Question:
Is it possible to create storage credentials via tf atm and if so, how should I configure the provider?
01-12-2024 06:48 AM
We are using a service principal for TF deployment. It has account admin permission. For creating the storage credentials, we are using workspace level provider:
provider "databricks" {
alias = "workspace"
host = "https://dbc-XXXX.cloud.databricks.com"
auth_type = "oauth-m2m"
}
# Storage credential for external sources
resource "databricks_storage_credential" "external_source" {
name = "${local.prefix}_${local.external_source_credential_name}"
aws_iam_role {
role_arn = var.external_source_role_arn
}
provider = databricks.workspace
}
Then for adding an additional external location to an existing credential, I faced an issue during apply. What helped for me, setting skip_validation = true:
resource "databricks_external_location" "this" {
name = "${var.prefix}_${var.external_location_name}"
url = var.external_location_url
credential_name = var.storage_credentials_id
read_only = var.read_only
skip_validation = true
}
So now we can automate everything with TF. Hope that helps 🙂
01-17-2024 02:57 PM
The problem is your host should be the workspace url, not the account url. Also as mentioned before the credentials should have account admin privilege if you want to have it return a known external id.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group