cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks bundle

david_nagy
New Contributor II

Hey, 

I am new to Databricks, and I am trying to test the mlops-stack bundle. 

Within that bundle there is a feature-engineering workflow and I have a problem to make it run. 
The main problem is the following.
the bundle specified the target to be $bundle.target which is in my case would be dev. I have created the dev catalog and within the project schema according to the template. 

The issue is that when I run the workflow, the notebook fails at 

from databricks.feature_engineering import FeatureEngineeringClient

fe = FeatureEngineeringClient()

# Create the feature table if it does not exist first.
# Note that this is a no-op if a table with the same name and schema already exists.
fe.create_table(
    name=output_table_name,    
    primary_keys=[x.strip() for x in pk_columns.split(",")] + [ts_column],  # Include timeseries column in primary_keys
    timestamp_keys=[ts_column],
    df=features_df,
)

# Write the computed features dataframe.
fe.write_table(
    name=output_table_name,
    df=features_df,
    mode="merge",
)

I am getting that:
ValueError: Catalog 'dev' does not exist in the metastore.
And I don't understand why?. If I ran the notebook through my own cluster.

I tried to give all privileges to all users in the workspace, but it did not help.

7 REPLIES 7

gchandra
Databricks Employee
Databricks Employee

The dev you mention in the bundle target is different from the dev catalog. 

What is the value of "output_table_name". If its a 3 namespace value  catalog_name.db_name.table_name please make sure you have write access to that catalog and dbname.

Read more here

https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html



~

david_nagy
New Contributor II

Hi @gchandra 
Thanks for the answer. the output table name is what is inside the databricks templated mlops-stack.
Regarding the $bundle.target
according to your(Databricks commented) databricks.yml:

# Deployment Target specific values for workspace
targets:
dev: # UC Catalog Name <---it is commented here 
default: true
workspace:
# TODO: add dev workspace URL

So if it is not the created catalog then, what is the target? I am following your mlops-stack to the letter.

gchandra
Databricks Employee
Databricks Employee

Apologies, I misread your question.

Can you please share your databricks.yml file or the URL you followed?



~

david_nagy
New Contributor II

This is the mlops-stack which I am trying to follow.
https://github.com/databricks/mlops-stacks/tree/main/template/%7B%7B.input_root_dir%7D%7D/%7B%7Btemp...

I instantiated it by:

  • databricks bundle init mlops-stack

I am first try to test all project related workflow in dev, and how they are interacts, and later I want to test with CICD, so I can deploy across 3 different workspaces (dev/staging/prod).


david_nagy
New Contributor II

When I execute the notebook through databricks bundle run -t dev write_feature_table_job 
I printed the available catalogs and I only got spark_catalog.

gchandra
Databricks Employee
Databricks Employee

Is your workspace UC enabled?



~

david_nagy
New Contributor II

Yes it is.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group