To provision Unity Catalog using Terraform and authenticate with AWS through IAM Roles and Policies, you'll need to follow these steps:
- Create an IAM Role for Unity Catalog:
First, you need to create an IAM role that Unity Catalog can assume to access your S3 resources. This role needs a trust relationship with Databricks and should be self-assuming.
resource "aws_iam_role" "unity_catalog_role" {
name = "unity-catalog-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
AWS = [
"arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL",
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${aws_iam_role.unity_catalog_role.name}"
]
}
Action = "sts:AssumeRole"
Condition = {
StringEquals = {
"sts:ExternalId" = var.databricks_account_id
}
}
}
]
})
}
- Attach necessary policies to the IAM Role:
Attach policies that grant the necessary permissions to access your S3 resources:
resource "aws_iam_role_policy_attachment" "unity_catalog_s3_access" {
role = aws_iam_role.unity_catalog_role.name
policy_arn = aws_iam_policy.s3_access_policy.arn
}
resource "aws_iam_policy" "s3_access_policy" {
name = "unity-catalog-s3-access"
description = "Policy for Unity Catalog to access S3"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
]
Resource = [
"arn:aws:s3:::your-unity-catalog-bucket",
"arn:aws:s3:::your-unity-catalog-bucket/*"
]
}
]
})
}
- Create a Storage Credential in Unity Catalog:
Use the Databricks Terraform provider to create a storage credential using the IAM role:
resource "databricks_storage_credential" "unity_catalog_credential" {
name = "unity-catalog-credential"
aws_iam_role {
role_arn = aws_iam_role.unity_catalog_role.arn
}
comment = "Credential for Unity Catalog"
}
- Create an External Location:
Create an external location in Unity Catalog that uses the storage credential:
resource "databricks_external_location" "unity_catalog_location" {
name = "unity-catalog-location"
url = "s3://your-unity-catalog-bucket"
credential_name = databricks_storage_credential.unity_catalog_credential.name
comment = "External location for Unity Catalog"
}
- Create a Metastore:
Finally, create the Unity Catalog metastore:
resource "databricks_metastore" "this" {
name = "unity-catalog-metastore"
storage_root = "s3://your-unity-catalog-bucket/metastore"
force_destroy = true
}
- Assign the Metastore to your Workspace:
resource "databricks_metastore_assignment" "this" {
workspace_id = var.databricks_workspace_id
metastore_id = databricks_metastore.this.id
default_catalog_name = "hive_metastore"
}