Databricks bundle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 04:13 AM
Hey,
I am new to Databricks, and I am trying to test the mlops-stack bundle.
Within that bundle there is a feature-engineering workflow and I have a problem to make it run.
The main problem is the following.
the bundle specified the target to be $bundle.target which is in my case would be dev. I have created the dev catalog and within the project schema according to the template.
The issue is that when I run the workflow, the notebook fails at
from databricks.feature_engineering import FeatureEngineeringClient
fe = FeatureEngineeringClient()
# Create the feature table if it does not exist first.
# Note that this is a no-op if a table with the same name and schema already exists.
fe.create_table(
name=output_table_name,
primary_keys=[x.strip() for x in pk_columns.split(",")] + [ts_column], # Include timeseries column in primary_keys
timestamp_keys=[ts_column],
df=features_df,
)
# Write the computed features dataframe.
fe.write_table(
name=output_table_name,
df=features_df,
mode="merge",
)
I am getting that:
ValueError: Catalog 'dev' does not exist in the metastore.
And I don't understand why?. If I ran the notebook through my own cluster.
I tried to give all privileges to all users in the workspace, but it did not help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 04:33 AM
The dev you mention in the bundle target is different from the dev catalog.
What is the value of "output_table_name". If its a 3 namespace value catalog_name.db_name.table_name please make sure you have write access to that catalog and dbname.
Read more here
https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html
~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 04:38 AM
Hi @gchandra
Thanks for the answer. the output table name is what is inside the databricks templated mlops-stack.
Regarding the $bundle.target
according to your(Databricks commented) databricks.yml:
So if it is not the created catalog then, what is the target? I am following your mlops-stack to the letter.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 04:49 AM
Apologies, I misread your question.
Can you please share your databricks.yml file or the URL you followed?
~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 04:54 AM
This is the mlops-stack which I am trying to follow.
https://github.com/databricks/mlops-stacks/tree/main/template/%7B%7B.input_root_dir%7D%7D/%7B%7Btemp...
I instantiated it by:
- databricks bundle init mlops-stack
I am first try to test all project related workflow in dev, and how they are interacts, and later I want to test with CICD, so I can deploy across 3 different workspaces (dev/staging/prod).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 06:18 AM
When I execute the notebook through databricks bundle run -t dev write_feature_table_job
I printed the available catalogs and I only got spark_catalog.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 06:25 AM
Is your workspace UC enabled?
~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2024 06:43 AM
Yes it is.
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)