cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Terraform Databricks Integration - specially for Unity Catalog in AWS S3

debal
New Contributor

We are attempting to provision Unity Catalog using Terraform, but we're encountering issues with establishing authentication with AWS through IAM Roles and Policies.

For EC2/Cluster instances, the instance profile works fine with a trust relationship of "ec2.amazonaws.com." However, when it comes to creating Unity Catalog, we need to use an AWS Role to access S3 resources.

Please note, this is all being done with AWS Roles/Policies, not AWS credentials/keys.

Any assistance, guidance, or links to relevant materials would be greatly appreciated!

1 REPLY 1

Walter_C
Databricks Employee
Databricks Employee

To provision Unity Catalog using Terraform and authenticate with AWS through IAM Roles and Policies, you'll need to follow these steps:

  1. Create an IAM Role for Unity Catalog:

First, you need to create an IAM role that Unity Catalog can assume to access your S3 resources. This role needs a trust relationship with Databricks and should be self-assuming.

resource "aws_iam_role" "unity_catalog_role" {
  name = "unity-catalog-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          AWS = [
            "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL",
            "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${aws_iam_role.unity_catalog_role.name}"
          ]
        }
        Action = "sts:AssumeRole"
        Condition = {
          StringEquals = {
            "sts:ExternalId" = var.databricks_account_id
          }
        }
      }
    ]
  })
}
  1. Attach necessary policies to the IAM Role:

Attach policies that grant the necessary permissions to access your S3 resources:

resource "aws_iam_role_policy_attachment" "unity_catalog_s3_access" {
  role       = aws_iam_role.unity_catalog_role.name
  policy_arn = aws_iam_policy.s3_access_policy.arn
}

resource "aws_iam_policy" "s3_access_policy" {
  name        = "unity-catalog-s3-access"
  description = "Policy for Unity Catalog to access S3"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetBucketLocation",
          "s3:ListBucket",
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject",
          "s3:ListMultipartUploadParts",
          "s3:AbortMultipartUpload"
        ]
        Resource = [
          "arn:aws:s3:::your-unity-catalog-bucket",
          "arn:aws:s3:::your-unity-catalog-bucket/*"
        ]
      }
    ]
  })
}
  1. Create a Storage Credential in Unity Catalog:

Use the Databricks Terraform provider to create a storage credential using the IAM role:

resource "databricks_storage_credential" "unity_catalog_credential" {
  name = "unity-catalog-credential"
  aws_iam_role {
    role_arn = aws_iam_role.unity_catalog_role.arn
  }
  comment = "Credential for Unity Catalog"
}
  1. Create an External Location:

Create an external location in Unity Catalog that uses the storage credential:

resource "databricks_external_location" "unity_catalog_location" {
  name            = "unity-catalog-location"
  url             = "s3://your-unity-catalog-bucket"
  credential_name = databricks_storage_credential.unity_catalog_credential.name
  comment         = "External location for Unity Catalog"
}
  1. Create a Metastore:

Finally, create the Unity Catalog metastore:

resource "databricks_metastore" "this" {
  name = "unity-catalog-metastore"
  storage_root = "s3://your-unity-catalog-bucket/metastore"
  force_destroy = true
}
  1. Assign the Metastore to your Workspace:
    resource "databricks_metastore_assignment" "this" {
      workspace_id         = var.databricks_workspace_id
      metastore_id         = databricks_metastore.this.id
      default_catalog_name = "hive_metastore"
    }

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group