09-23-2022 02:38 AM
🐔 and 🐣 situation?
I am currently trying to came up with the way how to deploy Databricks with Terraform in multi-region, multi-tenant environment. I am not talking about simple cases like this (https://docs.databricks.com/data-governance/unity-catalog/automate.html).
Ideally I would like to have separate DEV UC and PROD UC at least, with multiple workspaces.
I've created some modules for re-usable resources:
metastore(uc)
s3
vpc
workspace
and I've planned my deployment this way:
datalake/dev_1
datalake/dev_2
datalake/prod_1
datalake/prod_2
datalake/global
dev_* and prod_* - different workspaces
global - metastore
The idea was to create first Unity Catalog (metastore) then in each datalake/env_* workspace attach workspace to the Unity Catalog, but it looks like we can create the Unity Catalog without the workspace, but there is no way to assign metastore_data_access to the Unity Catalog without the workspace.
--
Link to the example usage (https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/metastore_data_access#example-usage)
It seems like it should work.
## Create UC metastore
resource "databricks_metastore" "this" {
provider = databricks.workspace
name = "${local.prefix}-${var.workspace}-metastore-${var.region}-${var.env}"
storage_root = "s3://${var.aws_s3_bucket}/metastore"
owner = var.owner
force_destroy = true
}
resource "databricks_metastore_data_access" "this" {
provider = databricks.workspace
metastore_id = databricks_metastore.this.id
name = aws_iam_role.metastore_data_access.name
aws_iam_role {
role_arn = aws_iam_role.metastore_data_access.arn
}
is_default = true
}
In the Unity Catalog deployment blog (https://docs.databricks.com/data-governance/unity-catalog/automate.html#configure-a-metastore) you can see assignment of the ws to the metastore (or vice versa).
resource "databricks_metastore_assignment" "default_metastore" {
depends_on = [ databricks_metastore_data_access.metastore_data_access ]
workspace_id = var.default_metastore_workspace_id
metastore_id = databricks_metastore.metastore.id
default_catalog_name = var.default_metastore_default_catalog_name
}
This is actually done first, so I guess the steps should be changed for better visibility:
variable "metastore_name" {}
variable "metastore_label" {}
variable "default_metastore_workspace_id" {}
variable "default_metastore_default_catalog_name" {}
resource "databricks_metastore" "metastore" {
name = var.metastore_name
storage_root = "s3://${aws_s3_bucket.metastore.id}/${var.metastore_label}"
force_destroy = true
}
resource "databricks_metastore_assignment" "default_metastore" {
depends_on = [ databricks_metastore_data_access.metastore_data_access ]
workspace_id = var.default_metastore_workspace_id
metastore_id = databricks_metastore.metastore.id
default_catalog_name = var.default_metastore_default_catalog_name
}
resource "databricks_metastore_data_access" "metastore_data_access" {
depends_on = [ databricks_metastore.metastore ]
metastore_id = databricks_metastore.metastore.id
name = aws_iam_role.metastore_data_access.name
aws_iam_role { role_arn = aws_iam_role.metastore_data_access.arn }
is_default = true
}
Let's get back to the question. How you are planning your terraform deployment with UC ? It would be great to learn how others are dealing with this.
I have 2 things in my mind,
This means that my `global` idea won't work.
Looking forward to see some ideas.
09-27-2022 08:37 AM
I got the same error
Error: cannot create metastore data access: No metastore assigned for the current workspace.
and fixed it by reversing the depends_on order. I am now making databricks_metastore_data_access depend on databricks_metastore_assignment. Here is my code:
resource "databricks_metastore" "this" {
provider = databricks.workspace
name = "${var.prefix}-metastore"
storage_root = "s3://${aws_s3_bucket.metastore.id}/metastore"
delta_sharing_scope = "INTERNAL"
delta_sharing_recipient_token_lifetime_in_seconds = 120
force_destroy = true
}
// Assign the metastore to workspaces
resource "databricks_metastore_assignment" "this" {
provider = databricks.workspace
count = length(var.workspaces)
metastore_id = databricks_metastore.this.id
workspace_id = tonumber(replace(var.workspaces[count.index], "/.*//", ""))
depends_on = [ databricks_metastore.this ]
}
resource "databricks_metastore_data_access" "metastore_data_access" {
provider = databricks.workspace
metastore_id = databricks_metastore.this.id
name = aws_iam_role.metastore_data_access.name
aws_iam_role { role_arn = aws_iam_role.metastore_data_access.arn }
is_default = true
depends_on = [ databricks_metastore_assignment.this ]
}
09-27-2022 09:56 AM
Yes, I get this would help.
My problem is that ideally I would like to avoid assigning the metastore to workspace before databricks_metastore_data_access
My initial plan was to put unit catalog deployment to separate folder in the terraform structure and each workspace as well:
dev/
dev/uc
dev/ws_1
dev/ws_2
but I guess that I will need to re-think this. The order of the steps confused me a bit 🙂
10-16-2022 10:29 PM
Hi @Pat Sienkiewicz
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
07-25-2023 12:52 PM
Hi, I'm experiencing a similar issue. I filed an Issue on Github here.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group