a month ago
Hi, I am working from local notebook using vscode databricks extension.
I am trying to use FeatureEngineeringClient, when I create data set
training_set = fe.create_training_set(
df=filtered_data_train,
feature_lookups=payments_feature_lookups,
label="churn",
exclude_columns=exclude_columns,
)
I receive error
"name": "TypeError",
"message": "'NoneType' object cannot be interpreted as an integer",
but looking at logs I see error related with
File ~/projects/databricks_hwm_churn/.venv/lib/python3.12/site-packages/databricks/ml_features_common/entities/feature_spec.py:243, in FeatureSpec.to_proto(self)
241 proto_feature_spec.input_functions.append(function_info.to_proto())
242 proto_feature_spec.serialization_version = self.serialization_version
--> 243 proto_feature_spec.workspace_id = self.workspace_id
244 proto_feature_spec.feature_store_client_version = (
245 self._feature_store_client_version
246 )
247 return proto_feature_spec
Could someone explain how can I fix problem with self.workspace_id working from local notebook?
Same code is working if I run it from databricks browser notebook.
Thank you!
3 weeks ago
TypeError: 'NoneType' object cannot be interpreted as an integer
, arises because the workspace_id
is not properly set when running the FeatureEngineeringClient
in a local notebook using Visual Studio Code. This issue likely stems from the absence of a correctly initialized workspace context required by the Databricks Feature Engineering Client.FeatureEngineeringClient
expects the workspace_id
to be set. If the automatic fetch fails in the local environment, you can assign it manually. To do so, retrieve your workspace ID (available in your Databricks workspace URL, e.g., https://<workspace-id>.cloud.databricks.com
) and set it using code like this: from databricks.feature_engineering import FeatureEngineeringClient
client = FeatureEngineeringClient()
workspace_id = "<your_workspace_id>" # Replace with your actual workspace ID
client._catalog_client._local_workspace_id = workspace_id
client._catalog_client._feature_store_workspace_id = workspace_id
WORKSPACE_ID
_DATABRICKS_WORKSPACE_HOST
_DATABRICKS_WORKSPACE_ID
3 weeks ago
I just checked client._catalog_client._local_workspace_id in a Databricks notebook, and itโs actually not equal to https://<workspace-id>.cloud.databricks.com.
I used the value retrieved from the Databricks notebook in my local notebook with your code, and it worked perfectly. Thanks a lot for your help!
3 weeks ago
TypeError: 'NoneType' object cannot be interpreted as an integer
, arises because the workspace_id
is not properly set when running the FeatureEngineeringClient
in a local notebook using Visual Studio Code. This issue likely stems from the absence of a correctly initialized workspace context required by the Databricks Feature Engineering Client.FeatureEngineeringClient
expects the workspace_id
to be set. If the automatic fetch fails in the local environment, you can assign it manually. To do so, retrieve your workspace ID (available in your Databricks workspace URL, e.g., https://<workspace-id>.cloud.databricks.com
) and set it using code like this: from databricks.feature_engineering import FeatureEngineeringClient
client = FeatureEngineeringClient()
workspace_id = "<your_workspace_id>" # Replace with your actual workspace ID
client._catalog_client._local_workspace_id = workspace_id
client._catalog_client._feature_store_workspace_id = workspace_id
WORKSPACE_ID
_DATABRICKS_WORKSPACE_HOST
_DATABRICKS_WORKSPACE_ID
3 weeks ago
Hi, thank you for your response. However, Iโm still stuck with the solution:
I tried manually setting the workspace_id, but I encountered the same error on the same line of code. The only difference is that now it says: 'str' object cannot be interpreted as an integer (I used string for
).
Could you please share the link where you found all the variables that need to be configured for the extension?
I read the Databricks Connect documentation but I didnโt find any specific information about configuration, especially considering that other Databricks tools like Spark are working fine. Iโm confused why only the feature engineering part is failing.
If you have any other ideas or suggestions on what I could check, Iโd be happy to try them. Thanks again!
3 weeks ago
I just checked client._catalog_client._local_workspace_id in a Databricks notebook, and itโs actually not equal to https://<workspace-id>.cloud.databricks.com.
I used the value retrieved from the Databricks notebook in my local notebook with your code, and it worked perfectly. Thanks a lot for your help!
3 weeks ago
Iโve done some additional research and found that the FeatureStoreClient is not officially supported when accessing a managed Databricks environment from an external IDE, even when using Databricks Connect. The client library is designed to operate within the Databricks Runtime and does not currently support direct access to feature tables from external environments.
That said, this limitation may change in the near future. I hope this helps!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now