cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

FeatureEngineeringClient and Databricks Connect

antonioferegrin
New Contributor
Hello everyone, I want to use Databricks Connect to connect externally to my clusters and run code, and while Databricks connect works without any issue, like this:

 

```
from databricks.sdk.core import Config

 

config = Config(cluster_id="XXXX")
spark = SparkSession.builder.sdkConfig(config).getOrCreate()

 

catalog = "my_test_catalog"
schema = "my_schema"
table = "my_table"

 

spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog}")
spark.sql(f"USE CATALOG {catalog}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {schema}")
spark.sql(f"USE SCHEMA {schema}")
 
```
 
However, I try to use the `FeatureEngineeringClient` to create a table like this:
```
from databricks.feature_engineering import FeatureEngineeringClient
from pyspark.sql.types import StringType, StructField, StructType, TimestampType

 

log_message_schema = StructType(
[
StructField("message", StringType(), True),
StructField("application", StringType(), True),
StructField("request_id", StringType(), False),
StructField("timestamp", TimestampType(), False),
StructField("levelname", StringType(), True),
StructField("data", StringType(), True),
StructField("run_id", StringType(), True),
StructField("model_name", StringType(), True),
]
)

 

feature_engineering_client = FeatureEngineeringClient()

 

feature_engineering_client.create_table(
name=f"{catalog}.{schema}.{table}_offline",
primary_keys=["request_id", "timestamp"],
timestamp_keys=["timestamp"],
schema=log_message_schema,
)
```

 

I get an authentication error:

 

```
Exception: {'error_code': '401', 'message': 'Unauthorized'}
```

 

Do you know why this could be happening?
3 REPLIES 3

VZLA
Databricks Employee
Databricks Employee

TL;DR Sounds like a limitation.

I believe the doc's paragraph (https://docs.databricks.com/en/machine-learning/feature-store/python-api.html#limitations), "implicitly" but not very clearly, implies that a SparkSession with Access to UC and databricks resources is required; in case of DatabricksConnect it seems like it is not passed to the FEClient during initialization, otherwise it should avoid such Authorization errors.

I assume with this exception, you're also also finding a full stacktrace; you can confirm if the above is indeed the problem by inspecting it, and then identify which code line is raising the exception.

saurabh18cs
Valued Contributor III

do you have modify rights on this schema to create a table?

praful932
New Contributor II

@VZLA 
Is there a plan to support this in the future?

At the moment, it makes development difficult through a local IDE using databricks connect due to this limitation as we are unable to use the Feature Engineering SDK.

Sharing the stacktrace


---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 fs_client.fe_client.get_table(name = 'some_table')

File site-packages/databricks/feature_engineering/client.py:547, in FeatureEngineeringClient.get_table(self, name)
    541 self._validate_is_uc_table_name(name)
    542 name = uc_utils.get_full_table_name(
    543     name,
    544     self._spark_client.get_current_catalog(),
    545     self._spark_client.get_current_database(),
    546 )
--> 547 return self._compute_client.get_table(
    548     name=name,
    549     req_context=RequestContext(
    550         request_context.GET_TABLE, request_context.FEATURE_ENGINEERING_CLIENT
    551     ),
    552 )

File site-packages/databricks/ml_features/_compute_client/_compute_client.py:776, in ComputeClient.get_table(self, name, req_context, include_producers)
    774 self._spark_client_helper.check_feature_table_exists(name)
    775 if is_uc_table:
--> 776     feature_table = self._catalog_client_helper.get_feature_table_from_uc_and_online_store_from_fs(
    777         name, req_context, include_producers=include_producers
    778     )
    779 else:
    780     feature_table = self._catalog_client.get_feature_table(
    781         name, req_context, include_producers=include_producers
    782     )

File site-packages/databricks/ml_features/_catalog_client/_catalog_client_helper.py:150, in CatalogClientHelper.get_feature_table_from_uc_and_online_store_from_fs(self, table_name, req_context, include_producers)
    144 def get_feature_table_from_uc_and_online_store_from_fs(
    145     self,
    146     table_name: str,
    147     req_context: RequestContext,
    148     include_producers: bool = False,
    149 ):
--> 150     uc_response = self._databricks_client.get_uc_table(table_name)
    151     feature_table_from_uc = FeatureTable.from_uc_get_table_response(uc_response)
    153     if "table_type" in uc_response and uc_response["table_type"] == "VIEW":

File site-packages/databricks/ml_features/_databricks_client/_databricks_client.py:77, in DatabricksClient.get_uc_table(self, table_name)
     71 url = f"/api/2.1/unity-catalog/tables/{quote(table_name)}"
     72 response = http_request(
     73     self._get_host_creds(),
     74     url,
     75     method="GET",
     76 )
---> 77 verify_rest_response(response, url)
     78 resp_json = response.json()
     79 return resp_json

File site-packages/databricks/ml_features/utils/rest_utils.py:165, in verify_rest_response(response, endpoint)
    162 if response.status_code != 200:
    163     if _can_parse_as_json(response.text):
    164         # ToDo(ML-20622): return cleaner error to client, eg: mlflow.exceptions.RestException
--> 165         raise Exception(json.loads(response.text))
    166     else:
    167         base_msg = (
    168             "API request to endpoint %s failed with error code "
    169             "%s != 200"
   (...)
    173             )
    174         )

Exception: {'error_code': 403, 'message': 'Invalid access token. [TraceId: some_trace_id]'}

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group