cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Create a table from external location volume using terraform

jpdoliveira
New Contributor

Hi everyone,
Iโ€™m trying to create a table in Unity Catalog from a folder that lives inside a Volume, but I canโ€™t get Terraform to create it successfully.
Below is the simplified/anonymous version of my Terraform setup.
I have:
* A Unity Catalog
* A Schema
* A Storage Credential (AWS IAM role)
* An External Location
* A Volume mapped to that external location
* A folder inside the Volume (e.g., /Volumes/my_catalog/my_schema/my_volume/gold_layer/)

resource "databricks_catalog" "example_catalog" {
  name    = "my-catalog"
  comment = "example"
}

resource "databricks_schema" "example_schema" {
  catalog_name = databricks_catalog.example_catalog.id
  name         = "my-schema"
}

resource "databricks_storage_credential" "example_cred" {
  name = "example-cred"
  aws_iam_role {
    role_arn = var.example_role_arn
  }
}

resource "databricks_external_location" "example_location" {
  name            = "example-location"
  url             = var.example_s3_path   # e.g. s3://my-bucket/path/
  credential_name = databricks_storage_credential.example_cred.id
  read_only       = true
  skip_validation = true
}

resource "databricks_volume" "example_volume" {
  name             = "my-volume"
  catalog_name     = databricks_catalog.example_catalog.name
  schema_name      = databricks_schema.example_schema.name
  volume_type      = "EXTERNAL"
  storage_location = databricks_external_location.example_location.url
}

  * I want to use that folder and create a table. Using the Databricks UI i can do this and everything is fine, but with terrafom i can't figure how except creating a view like this:

resource "databricks_sql_table" "example_view" {
  name         = "gold_layer"
  catalog_name = databricks_catalog.example_catalog.name
  schema_name  = databricks_schema.example_schema.name
  table_type   = "VIEW"

  view_definition = "SELECT * FROM read_files('/Volumes/${databricks_catalog.example_catalog.name}/${databricks_schema.example_schema.name}/${databricks_volume.example_volume.name}/gold_layer/')"
}
4 REPLIES 4

nayan_wylde
Esteemed Contributor

Storing Delta tables in volumes is generally not recommended because volumes are designed for non-tabular data, and tables have their own specific storage structure (Parquet files with metadata) that is better managed by tables or file paths directly. Using volumes for Delta tables can hinder proper governance and table registration in catalogs like Unity Catalog. While you can write Delta tables to volumes, you cannot register them as tables in Unity Catalog when stored this way.
Why volumes are not ideal for Delta tables
Governance and registration: Volumes are for unstructured and semi-structured data, while Delta tables have a specific, structured format and are managed as tables, not files, for proper governance in systems like Unity Catalog.
Table vs. non-tabular data: Volumes govern non-tabular data, but Delta tables are tables, not just files. This distinction is important for how they are accessed and managed.
Catalog registration: You can't register a Delta table that is stored in a volume as a table in the Unity Catalog metastore. This means you lose the ability to use standard SQL queries or access the table's metadata through the catalog.

iyashk-DB
Databricks Employee
Databricks Employee

@jpdoliveira 

The volume patj is designed for nonโ€‘tabular, fileโ€‘level access; its FUSE path (/Volumes/...) is not a valid table LOCATION scheme in Unity Catalog, so attempting to create an external table with a /Volumes/... path will fail with โ€œinvalid schemeโ€ style errors. You can use the cloud path behind your volumeโ€™s external location, not the /Volumes path. Unity Catalog external tables must point to a cloud URI covered by an External location.

One more suggestion would to be to use the newer databricks_sql_table instead of the databricks_table.

resource "databricks_sql_table" "gold_external" {
  name         = "gold_layer"
  catalog_name = databricks_catalog.example_catalog.name
  schema_name  = databricks_schema.example_schema.name
  table_type   = "EXTERNAL"

  # IMPORTANT: use the cloud URI, not the /Volumes path
  # For example: s3://my-bucket/path/gold_layer/
  storage_location   = "${var.example_s3_path}gold_layer/"
  data_source_format = "DELTA"   # or "PARQUET", "CSV", etc., to match the files
}

Just please ensure your external location grants CREATE EXTERNAL TABLE (and typically READ FILES) to the principal used by Terraform, and that workspace bindings allow access from your workspace.

Hello,

I tried that but i keep getting an error. Even after giving ALL PRIVILEGES grant to the external location

โ”‚ Error: cannot create sql table: cannot execute CREATE TABLE `my-catalog`.`my-schema`.`gold_layer`
โ”‚ USING PARQUET
โ”‚ COMMENT 'Managed by terraform'
โ”‚ LOCATION 's3://bucket_name/gold_layer';: Running

 When going to the UI the table shows there without anything.

Hubert-Dudek
Esteemed Contributor III

It will not work because it will detect that the locations overlap. I tried it in some experiments, like displaying delta_log in volumes, but it didn't work. In fact, it is secure this way as the Delta files in volume would bypass Unity Catalog ACL for that table.