cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Databricks Terraform Provider Issues Passing Providers to Child Modules

absolutelyRice
New Contributor III

I have been following the documentation on the terraform databricks documentation in order to provision account level resources on AWS. I can create the workspace fine, add users, etc... However, when I go to use the provider in non-mws mode, I am receiving errors saying:

 Error: workspace is most likely not created yet, because the `host` is empty. Please add `depends_on = [databricks_mws_workspaces.this]` or `depends_on = [azurerm_databricks_workspace.this]` to every data resource. See https://www.terraform.io/docs/language/resources/behavior.html more info. Please check https://registry.terraform.io/providers/databricks/databricks/latest/docs#authentication for details
ā”‚
ā”‚   with module.workspaces.data.databricks_spark_version.latest,
ā”‚   on ../modules/aws_workspaces/init.tf line 11, in data "databricks_spark_version" "latest":
ā”‚   11: data "databricks_spark_version" "latest" {}

To show how this created I have a file called root.tf which creates the root mws level resources absolutely fine.

provider "databricks" {
  alias    = "mws"
  host     = "https://accounts.cloud.databricks.com"
  username = var.databricks_account_username
  password = var.databricks_account_password
}
 
 
module "root" {
  source = "../modules/aws_root"
 
  databricks_account_id = var.databricks_account_id
 
  tags = var.tags
 
 
  region = var.region
 
  cidr_block = var.cidr_block
 
  databricks_users            = var.databricks_users
  databricks_metastore_admins = var.databricks_metastore_admins
 
  unity_admin_group = var.unity_admin_group
 
  providers = {
    databricks.mws = databricks.mws
  }
 
}

With outputs coming from that module:

output "databricks_host" {
  value = databricks_mws_workspaces.this.workspace_url
}
 
output "databricks_token" {
  value     = databricks_mws_workspaces.this.token[0].token_value
  sensitive = true
}
 
output "databricks_workspace_id" {
  value     = databricks_mws_workspaces.this.workspace_id
  sensitive = false
}
 
output "databricks_account_id" {
  value     = databricks_mws_workspaces.this.account_id
  sensitive = true
}
 
output "aws_iam_role_metastore_data_access_arn" {
  value = aws_iam_role.metastore_data_access.arn
}
 
output "aws_iam_role_metastore_data_access_name" {
  value = aws_iam_role.metastore_data_access.name
}
 
output "aws_s3_bucket_metastore_id" {
  value = aws_s3_bucket.metastore.id
}

These can be seen in the created resources when I do a

`terraform state show <outputs>`

However, when I go to create a workspace level provider to create some notebooks, clusters, etc... I seem to be unable to get the child module resources to use the newly created provider with a host, even though it is being set and I can see it's value. Even hard coding the host does not work. They all output the above error.

The creation of this provider and module can be seen here:

provider "databricks" {
  alias      = "workspace"
  host       = module.root.databricks_host
  token      = module.root.databricks_token
  account_id = module.root.databricks_account_id
}
 
 
 
 
module "workspaces" {
  source = "../modules/aws_workspaces"
 
  aws_s3_bucket_metastore_id              = module.root.aws_s3_bucket_metastore_id
  aws_iam_role_metastore_data_access_arn  = module.root.aws_iam_role_metastore_data_access_arn
  aws_iam_role_metastore_data_access_name = module.root.aws_iam_role_metastore_data_access_name
 
  cidr_block = var.cidr_block
 
  databricks_account_id        = var.databricks_account_id
  databricks_bronze_users      = var.databricks_bronze_users
  databricks_gold_users        = var.databricks_gold_users
  databricks_host              = module.root.databricks_host
  databricks_metastore_admins  = var.databricks_metastore_admins
  databricks_silver_users      = var.databricks_silver_users
  databricks_token             = module.root.databricks_token
  databricks_users             = var.databricks_users
  databricks_workspace_id      = module.root.databricks_workspace_id
  python_module_version_number = local.python_module_version_number
  shed_databricks_egg_name     = var.shed_databricks_egg_name
 
  tags              = var.tags
  unity_admin_group = var.unity_admin_group
  config            = local.config
 
  depends_on = [
    module.root
  ]
  providers = {
    databricks = databricks.workspace
  }
}

And the init method that uses this provider and is throwing the error can be seen here:

terraform {
  required_providers {
    databricks = {
      source                = "databricks/databricks"
      version               = "~> 1.6.2"
      configuration_aliases = [databricks.workspace]
    }
  }
}
 
data "databricks_spark_version" "latest" {}
data "databricks_node_type" "smallest" {
  local_disk = true
}

The suggestion to add a depends_on for the

`databricks_mws_workspace.this` isn't possible to do as it is created in the root module where the `databricks.mws` provider is used. (The documentation says each module should isolate providers.)

1 ACCEPTED SOLUTION

Accepted Solutions

absolutelyRice
New Contributor III

So the answer to this was that you need to explicitly pass the provider argument to each of the data resources blocks. The docs should be updated to accommodate that. ā€‹

i.e.

data "databricks_spark_version" "latest" {
  provider = databricks.workspace
}
data "databricks_node_type" "smallest" {
  provider = databricks.workspace
  local_disk = true
}

View solution in original post

5 REPLIES 5

Jfoxyyc
Valued Contributor

I'm also curious about this as I haven't been able to successfully create an output for the databricks account_id.ā€‹

absolutelyRice
New Contributor III

That's odd, it makes me wonder if any attributes of that resource are getting exported, I suppose one could try to write the values out during an apply using a null resource and just echoing the values into a local file

absolutelyRice
New Contributor III

So the answer to this was that you need to explicitly pass the provider argument to each of the data resources blocks. The docs should be updated to accommodate that. ā€‹

i.e.

data "databricks_spark_version" "latest" {
  provider = databricks.workspace
}
data "databricks_node_type" "smallest" {
  provider = databricks.workspace
  local_disk = true
}

Were you able to output the databricks account_id ?

I am going to be honest, I don't recall off the top of my head, but it is getting passed in as an argument to the other modules above so I assume so. I was able to verify that the other two were getting exported by adding a null resource that was something like:

    resource "null_resource" "echo" {
      local-exec = "echo '${module.root_mws.account_id}' > output.txt"
    }

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.