Liquid Clustering on a Feature Store Table Created with FeatureEngineeringClient
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-04-2024 02:56 AM
Hello everyone,
I'm exploring ways to perform clustering on a feature store table that I've created using the FeatureEngineeringClient in Databricks, and I'm particularly interested in applying liquid clustering to one of the columns.
Hereโs the scenario:
I created a feature store table using the following code:
from databricks.feature_engineering import FeatureEngineeringClient, FeatureLookup
# Initialize the FeatureEngineeringClient
fe = FeatureEngineeringClient()
# Define the feature store table with primary key and schema
fe.create_table(
name=table_name,
primary_keys=["wine_id"],
schema=features_df.schema,
description="wine features"
)
# Write data to the feature store table
fe.write_table(
name=table_name,
df=features_df,
mode="merge"
)
Now that I have the feature store table in place with various features, I'd like to apply liquid clustering to one of the columns (or multiple columns).
My Question:
How can I implement liquid clustering on this feature store table in Python? I know that I can enable liquid clustering on an existing unpartitioned Delta table using the following syntax:
ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>)
but that requires SQL.
Any help or code examples on this would be greatly appreciated!
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ12-09-2024 12:58 AM
Hi,
# Set the table name and clustering columns
table_name = "feature_store_table"
clustering_columns = ["column1", "column2"]
# Build the SQL command
sql_command = f"ALTER TABLE {table_name} CLUSTER BY ({', '.join(clustering_columns)})"
# Execute the SQL command
spark.sql(sql_command)
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""