cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Discussions
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Terraform databricks_storage_credential has wrong External ID

dvmentalmadess
Valued Contributor

We create storage credentials using Terraform. I don't see any way to specify a given External ID (DBR Account ID) when creating the credentials via Terraform or in the web UI console. However, today when I tried creating a new set of credentials using the same custom Terraform module we've used for every other storage credential, the external ID assigned to the credentials is different from all of our other storage credentials.

As a result, I'm unable to use the storage credentials. I get forbidden errors from S3 whenever I try to create a new external location using our terraform module. I am completely blocked from adding new sources to our production unity catalog using our automation. NOTE: this does not happen if I manually create credentials using the web UI.

Screenshot 2023-11-28 at 6.37.50โ€ฏPM.png

 While I could incorporate the "new" external id into my IAM policies, there is no way to get it from the Terraform resource attributes after it has been created. Also, the IAM role must be created before the storage credential. This seems like a bug, but I can't find any other instance of this happening elsewhere.

UPDATE: It appears the External ID of the storage credential is now different each time I re-create it (destroy then create).

1 ACCEPTED SOLUTION

Accepted Solutions

dvmentalmadess
Valued Contributor

I was able to get some help from Databricks support and finally confirmed the cause of the behavior: IAM policies for storage credentials now require an external ID (Nov 30, 2023). We have been using external ID since April '23, but this change broke us for a reason not mentioned in the documentation: external ID will be different depending upon the role of the user who creates the storage credential:

  • Account Admin: external ID will be the Databricks Account ID - this matches the behavior we've been relying on and that was recommended in the documentation when we build our implementation back in April. This is also why my manual workaround was successful, because I am an account admin.
  • All other roles: external ID will be a random UUID - this is new behavior as of Nov 30, 2023 and it was what we started seeing in our CD pipeline using Terraform apply.

The terraform storage credential resource does not return external ID so it's currently impossible to use terraform for creating storage credentials with anything other than an account admin role. The solution is we will have to use an account admin instead of just a workspace admin to create the storage credential resource. We might be able to create an additional databricks provider instance that is solely for the purpose of creating storage credentials so it doesn't change how we create our other resources. This won't be a big effort to change this since we are already using two providers: one for the Databricks Account API and one for the Databricks workspace API - we can just pass the account credentials we are already using to another provider (or replace the ones we're currently using for the workspace API)

ASIDE: I was worried that the random external ID values represented Databricks account IDs for other accounts. It was confirmed by support that this is not the case - they really are just random values and not anything else so there's no security issue created here.

View solution in original post

9 REPLIES 9

dvmentalmadess
Valued Contributor

UPDATE: I was able to manually work around the issue.

  1. Run terraform apply
  2. After apply fails, copy the IAM role ARN and name before deleting the storage credential with the incorrect External ID
  3. Manually create the storage credential using the same IAM role and name copied from the storage credential deleted in the previous step
  4. Re-run terraform plan
  5. Re-run terraform apply

While this is workable, I don't consider it a permanent solution and I would still appreciate a solution that will remove the need for a manual work around.

Gozrehm
New Contributor II

I am having a similar problem. So we have one storage credential and added the permission for a new bucket to the IAM role. When deploying with Terraform, we get this error:

Error: cannot create external location: AWS IAM role does not have READ permissions on url s3://...

When deploying with it via UI, it can't validate the read permissions, but the creation works. Also, I created an external table and was able to query it. Is there some caching issue or why is it unable to pick up the changed permissions?

@Gozrehmnot clear if this is the same thing, but I'm thinking you're experiencing a different issue. In my experience, it is common to see race conditions where Databricks doesn't see an expected change right away. I believe this is because the AWS API is often eventually consistent. In this kind of scenario, I'll often add a time_sleep resource to my terraform and use the `create_duration` attribute to handle this kind of scenario.

Another option is to use the `skip_validation` attribute of the `databricks_external_location` resource. This should work without having to introduce time_sleep, but just know that if there is any issues with your IAM policy you will see them crop up at run time instead of during deployment so make sure you account for that.

Were you able to solve our issue? If so, how?

Configure your provider with credentials that have been granted the account admin role. This solution means every storage credential created by this provider will have the Databricks Account ID as the external id value.

If that doesn't work for you then the only other solution is to make the process a two-step workflow so you can look up the external id (confirmed: it's just a random UUID if created by a user without the account admin role).

dvmentalmadess
Valued Contributor

I was able to get some help from Databricks support and finally confirmed the cause of the behavior: IAM policies for storage credentials now require an external ID (Nov 30, 2023). We have been using external ID since April '23, but this change broke us for a reason not mentioned in the documentation: external ID will be different depending upon the role of the user who creates the storage credential:

  • Account Admin: external ID will be the Databricks Account ID - this matches the behavior we've been relying on and that was recommended in the documentation when we build our implementation back in April. This is also why my manual workaround was successful, because I am an account admin.
  • All other roles: external ID will be a random UUID - this is new behavior as of Nov 30, 2023 and it was what we started seeing in our CD pipeline using Terraform apply.

The terraform storage credential resource does not return external ID so it's currently impossible to use terraform for creating storage credentials with anything other than an account admin role. The solution is we will have to use an account admin instead of just a workspace admin to create the storage credential resource. We might be able to create an additional databricks provider instance that is solely for the purpose of creating storage credentials so it doesn't change how we create our other resources. This won't be a big effort to change this since we are already using two providers: one for the Databricks Account API and one for the Databricks workspace API - we can just pass the account credentials we are already using to another provider (or replace the ones we're currently using for the workspace API)

ASIDE: I was worried that the random external ID values represented Databricks account IDs for other accounts. It was confirmed by support that this is not the case - they really are just random values and not anything else so there's no security issue created here.

Mathias_Peters
Contributor

I tried the proposed solution using an account provider like this

 

provider "databricks" {
  account_id = "ACCOUNT_ID"
  host       = "https://accounts.cloud.databricks.com"
}

 

for creating the storage credential. However, that did not work. I got an exception: 

 

Error: cannot create storage credential: No API found for 'POST /accounts/ACCOUNT_ID/metastores/storage-credentials'
โ”‚ 
โ”‚   with databricks_storage_credential.external,
โ”‚   on external-storage.tf line 3, in resource "databricks_storage_credential" "external":
โ”‚    3: resource "databricks_storage_credential" "external" {

 

Question:
Is it possible to create storage credentials via tf atm and if so, how should I configure the provider?

We are using a service principal for TF deployment. It has account admin permission. For creating the storage credentials, we are using workspace level provider:

provider "databricks" {
alias = "workspace"
host = "https://dbc-XXXX.cloud.databricks.com"
auth_type = "oauth-m2m"
}

# Storage credential for external sources
resource "databricks_storage_credential" "external_source" {
name = "${local.prefix}_${local.external_source_credential_name}"
aws_iam_role {
role_arn = var.external_source_role_arn
}
provider = databricks.workspace
}

Then for adding an additional external location to an existing credential, I faced an issue during apply.  What helped for me, setting skip_validation = true:

resource "databricks_external_location" "this" {
name = "${var.prefix}_${var.external_location_name}"
url = var.external_location_url
credential_name = var.storage_credentials_id
read_only = var.read_only
skip_validation = true
}

So now we can automate everything with TF. Hope that helps ๐Ÿ™‚

 The problem is your host should be the workspace url, not the account url. Also as mentioned before the credentials should have account admin privilege if you want to have it return a known external id.