cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Declarative Automation Bundles Volume creation fails with CATALOG_DOES_NOT_EXIST on first deploy

Luisbct
New Contributor II

Hi everyone,

I'm working with DAB, and I'm running into a deployment ordering issue.

On my first deploy, I get this error:

Error: cannot create resources.volumes.raw_data: Catalog 'mycatalog_prod' does not exist. (404 CATALOG_DOES_NOT_EXIST)

Endpoint: POST /api/2.1/unity-catalog/volumes
HTTP Status: 404 Not Found

But all the other resources (catalog,external locations, schemas) are created without problems.

However, when I run the exact same command a second time:

DATABRICKS_BUNDLE_ENGINE=direct databricks bundle deploy

it succeeds without any errors.

I have my resources split in multiple yml files:

  • catalog.yml
  • external_locations.yml
  • schemas.yml
  • volumes.yml

My Volumen resource is like

resources:
  volumes:
    raw_data:
      catalog_name: ${var.catalog}
      name: raw_data
      schema_name: staging
      volume_type: EXTERNAL
      storage_location: abfss://staging@stdatatest.dfs.core.windows.net/data/
      grants:
        - principal: mygroup
          privileges:
             - MANAGE

I tried change the ${var.catalog} to "mycatalog_prod" but didn't work 

Is this expected behavior due to resource creation order in DAB?

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Luisbct,

Looks like an ordering/dependency issue rather than a problem with your variable value.

The way you have defined your volume resource gives bundles only a string for catalog_name, so with the direct engine it can try to create the volume before the catalog exists on the very first deploy, which likely leads to CATALOG_DOES_NOT_EXIST. On the second deploy the catalog is already there, so it passes.

Can you try changing it so the volume explicitly depends on the catalog resource instead of the variable, e.g.,:

# catalog.yml
resources:
  catalogs:
    mycatalog_prod:
      name: mycatalog_prod

# schemas.yml
resources:
  schemas:
    staging:
      name: staging
      catalog_name: ${resources.catalogs.mycatalog_prod.name}

# volumes.yml
resources:
  volumes:
    raw_data:
      catalog_name: ${resources.catalogs.mycatalog_prod.name}
      schema_name: staging
      name: raw_data
      volume_type: EXTERNAL
      storage_location: abfss://staging@stdatatest.dfs.core.windows.net/data/
      grants:
        - principal: mygroup
          privileges:
            - MANAGE
You may also want to make sure youโ€™re using the direct bundle engine (as you are with DATABRICKS_BUNDLE_ENGINE=direct), which is required for catalogs/volumes defined in bundles.
 
Run databricks bundle validate --output json and confirm catalog_name for the volume is resolving to mycatalog_prod via resources.catalogs.mycatalog_prod.name.

After you wire the dependency this way, the first bundle deploy should create catalog --> schema --> volume in the right order and stop needing the second run.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

2 REPLIES 2

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Luisbct,

Looks like an ordering/dependency issue rather than a problem with your variable value.

The way you have defined your volume resource gives bundles only a string for catalog_name, so with the direct engine it can try to create the volume before the catalog exists on the very first deploy, which likely leads to CATALOG_DOES_NOT_EXIST. On the second deploy the catalog is already there, so it passes.

Can you try changing it so the volume explicitly depends on the catalog resource instead of the variable, e.g.,:

# catalog.yml
resources:
  catalogs:
    mycatalog_prod:
      name: mycatalog_prod

# schemas.yml
resources:
  schemas:
    staging:
      name: staging
      catalog_name: ${resources.catalogs.mycatalog_prod.name}

# volumes.yml
resources:
  volumes:
    raw_data:
      catalog_name: ${resources.catalogs.mycatalog_prod.name}
      schema_name: staging
      name: raw_data
      volume_type: EXTERNAL
      storage_location: abfss://staging@stdatatest.dfs.core.windows.net/data/
      grants:
        - principal: mygroup
          privileges:
            - MANAGE
You may also want to make sure youโ€™re using the direct bundle engine (as you are with DATABRICKS_BUNDLE_ENGINE=direct), which is required for catalogs/volumes defined in bundles.
 
Run databricks bundle validate --output json and confirm catalog_name for the volume is resolving to mycatalog_prod via resources.catalogs.mycatalog_prod.name.

After you wire the dependency this way, the first bundle deploy should create catalog --> schema --> volume in the right order and stop needing the second run.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Luisbct
New Contributor II

It works now, thanks a lot for your help