cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

FeatureEngineeringClient workspace id error

Kabi
New Contributor III

Hi, I am working from local notebook using vscode databricks extension.

I am trying to use FeatureEngineeringClient, when I create data set

    training_set = fe.create_training_set(
        df=filtered_data_train,
        feature_lookups=payments_feature_lookups,
        label="churn",
        exclude_columns=exclude_columns,
    )

I receive error

	"name": "TypeError",
	"message": "'NoneType' object cannot be interpreted as an integer",

but looking at logs I see error related with

File ~/projects/databricks_hwm_churn/.venv/lib/python3.12/site-packages/databricks/ml_features_common/entities/feature_spec.py:243, in FeatureSpec.to_proto(self)
    241     proto_feature_spec.input_functions.append(function_info.to_proto())
    242 proto_feature_spec.serialization_version = self.serialization_version
--> 243 proto_feature_spec.workspace_id = self.workspace_id
    244 proto_feature_spec.feature_store_client_version = (
    245     self._feature_store_client_version
    246 )
    247 return proto_feature_spec

Could someone explain how can I fix problem with self.workspace_id working from local notebook?

Same code is working if I run it from databricks browser notebook.

Thank you!

2 ACCEPTED SOLUTIONS

Accepted Solutions

BigRoux
Databricks Employee
Databricks Employee
The error you're encountering, TypeError: 'NoneType' object cannot be interpreted as an integer, arises because the workspace_id is not properly set when running the FeatureEngineeringClient in a local notebook using Visual Studio Code. This issue likely stems from the absence of a correctly initialized workspace context required by the Databricks Feature Engineering Client.
Here's how you can address this issue:
  1. Set the Workspace ID Manually:
    • The FeatureEngineeringClient expects the workspace_id to be set. If the automatic fetch fails in the local environment, you can assign it manually. To do so, retrieve your workspace ID (available in your Databricks workspace URL, e.g., https://<workspace-id>.cloud.databricks.com) and set it using code like this:
      from databricks.feature_engineering import FeatureEngineeringClient
    
      client = FeatureEngineeringClient()
      workspace_id = "<your_workspace_id>"  # Replace with your actual workspace ID
      client._catalog_client._local_workspace_id = workspace_id
      client._catalog_client._feature_store_workspace_id = workspace_id
      
    This method has been used successfully by others encountering similar issues.
  2. Ensure Proper Environment Configuration:
    • Verify that the required environment variables or Spark configurations are set up properly. Common variables to check in your local setup include:
      • WORKSPACE_ID
      • _DATABRICKS_WORKSPACE_HOST
      • _DATABRICKS_WORKSPACE_ID
    In some cases, however, even setting these variables might not resolve the issue, as observed in similar scenarios.
  3. Check Databricks Connect Integration:
    • If you are using the Databricks extension for Visual Studio Code, ensure that Databricks Connect is configured properly. This includes installing the necessary dependencies and configuring access credentials. Refer to the Databricks Connect documentation for detailed steps.

View solution in original post

Kabi
New Contributor III

I just checked client._catalog_client._local_workspace_id in a Databricks notebook, and itโ€™s actually not equal to https://<workspace-id>.cloud.databricks.com.

I used the value retrieved from the Databricks notebook in my local notebook with your code, and it worked perfectly. Thanks a lot for your help!

View solution in original post

4 REPLIES 4

BigRoux
Databricks Employee
Databricks Employee
The error you're encountering, TypeError: 'NoneType' object cannot be interpreted as an integer, arises because the workspace_id is not properly set when running the FeatureEngineeringClient in a local notebook using Visual Studio Code. This issue likely stems from the absence of a correctly initialized workspace context required by the Databricks Feature Engineering Client.
Here's how you can address this issue:
  1. Set the Workspace ID Manually:
    • The FeatureEngineeringClient expects the workspace_id to be set. If the automatic fetch fails in the local environment, you can assign it manually. To do so, retrieve your workspace ID (available in your Databricks workspace URL, e.g., https://<workspace-id>.cloud.databricks.com) and set it using code like this:
      from databricks.feature_engineering import FeatureEngineeringClient
    
      client = FeatureEngineeringClient()
      workspace_id = "<your_workspace_id>"  # Replace with your actual workspace ID
      client._catalog_client._local_workspace_id = workspace_id
      client._catalog_client._feature_store_workspace_id = workspace_id
      
    This method has been used successfully by others encountering similar issues.
  2. Ensure Proper Environment Configuration:
    • Verify that the required environment variables or Spark configurations are set up properly. Common variables to check in your local setup include:
      • WORKSPACE_ID
      • _DATABRICKS_WORKSPACE_HOST
      • _DATABRICKS_WORKSPACE_ID
    In some cases, however, even setting these variables might not resolve the issue, as observed in similar scenarios.
  3. Check Databricks Connect Integration:
    • If you are using the Databricks extension for Visual Studio Code, ensure that Databricks Connect is configured properly. This includes installing the necessary dependencies and configuring access credentials. Refer to the Databricks Connect documentation for detailed steps.

Kabi
New Contributor III

Hi, thank you for your response. However, Iโ€™m still stuck with the solution:

  1. I tried manually setting the workspace_id, but I encountered the same error on the same line of code. The only difference is that now it says: 'str' object cannot be interpreted as an integer (I used string for

    workspace_id variable)

    ).

  2. Could you please share the link where you found all the variables that need to be configured for the extension?

  3. I read the Databricks Connect documentation  but I didnโ€™t find any specific information about configuration, especially considering that other Databricks tools like Spark are working fine. Iโ€™m confused why only the feature engineering part is failing.

If you have any other ideas or suggestions on what I could check, Iโ€™d be happy to try them. Thanks again!

Kabi
New Contributor III

I just checked client._catalog_client._local_workspace_id in a Databricks notebook, and itโ€™s actually not equal to https://<workspace-id>.cloud.databricks.com.

I used the value retrieved from the Databricks notebook in my local notebook with your code, and it worked perfectly. Thanks a lot for your help!

BigRoux
Databricks Employee
Databricks Employee

Iโ€™ve done some additional research and found that the FeatureStoreClient is not officially supported when accessing a managed Databricks environment from an external IDE, even when using Databricks Connect. The client library is designed to operate within the Databricks Runtime and does not currently support direct access to feature tables from external environments.

That said, this limitation may change in the near future. I hope this helps!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now