Liquid Clustering on a Feature Store Table Created...

Direo · ‎09-04-2024

Hello everyone,

I'm exploring ways to perform clustering on a feature store table that I've created using the FeatureEngineeringClient in Databricks, and I'm particularly interested in applying liquid clustering to one of the columns.

Here’s the scenario:

I created a feature store table using the following code:

from databricks.feature_engineering import FeatureEngineeringClient, FeatureLookup

# Initialize the FeatureEngineeringClient
fe = FeatureEngineeringClient()

# Define the feature store table with primary key and schema
fe.create_table(
name=table_name,
primary_keys=["wine_id"],
schema=features_df.schema,
description="wine features"
)

# Write data to the feature store table
fe.write_table(
name=table_name,
df=features_df,
mode="merge"
)

Now that I have the feature store table in place with various features, I'd like to apply liquid clustering to one of the columns (or multiple columns).

My Question:

How can I implement liquid clustering on this feature store table in Python? I know that I can enable liquid clustering on an existing unpartitioned Delta table using the following syntax:

ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>)

but that requires SQL.

Any help or code examples on this would be greatly appreciated!

Thank you!

Liquid Clustering on a Feature Store Table Created with FeatureEngineeringClient

My Question: