Databricks Community

david_nagy · ‎10-17-2024

Hey,

I am new to Databricks, and I am trying to test the mlops-stack bundle.

Within that bundle there is a feature-engineering workflow and I have a problem to make it run.
The main problem is the following.
the bundle specified the target to be $bundle.target which is in my case would be dev. I have created the dev catalog and within the project schema according to the template.

The issue is that when I run the workflow, the notebook fails at

from databricks.feature_engineering import FeatureEngineeringClient

fe = FeatureEngineeringClient()

# Create the feature table if it does not exist first.
# Note that this is a no-op if a table with the same name and schema already exists.
fe.create_table(
    name=output_table_name,    
    primary_keys=[x.strip() for x in pk_columns.split(",")] + [ts_column],  # Include timeseries column in primary_keys
    timestamp_keys=[ts_column],
    df=features_df,
)

# Write the computed features dataframe.
fe.write_table(
    name=output_table_name,
    df=features_df,
    mode="merge",
)

I am getting that:
ValueError: Catalog 'dev' does not exist in the metastore.
And I don't understand why?. If I ran the notebook through my own cluster.

I tried to give all privileges to all users in the workspace, but it did not help.

gchandra · ‎10-17-2024

The dev you mention in the bundle target is different from the dev catalog.

What is the value of "output_table_name". If its a 3 namespace value catalog_name.db_name.table_name please make sure you have write access to that catalog and dbname.

Read more here

https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html

~

david_nagy · ‎10-17-2024

Hi @gchandra
Thanks for the answer. the output table name is what is inside the databricks templated mlops-stack.
Regarding the $bundle.target
according to your(Databricks commented) databricks.yml:

# Deployment Target specific values for workspace

targets:

dev: # UC Catalog Name <---it is commented here

default: true

workspace:

# TODO: add dev workspace URL

So if it is not the created catalog then, what is the target? I am following your mlops-stack to the letter.

gchandra · ‎10-17-2024

Apologies, I misread your question.

Can you please share your databricks.yml file or the URL you followed?

~

david_nagy · ‎10-17-2024

This is the mlops-stack which I am trying to follow.
https://github.com/databricks/mlops-stacks/tree/main/template/%7B%7B.input_root_dir%7D%7D/%7B%7Btemp...

I instantiated it by:

databricks bundle init mlops-stack

I am first try to test all project related workflow in dev, and how they are interacts, and later I want to test with CICD, so I can deploy across 3 different workspaces (dev/staging/prod).

david_nagy · ‎10-17-2024

When I execute the notebook through databricks bundle run -t dev write_feature_table_job
I printed the available catalogs and I only got spark_catalog.