cancel
Showing results for 
Search instead for 
Did you mean: 

Liquid clustering with structured streaming pyspark

Erik
Valued Contributor II

I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this:

 

silver_df.writeStream.partitionBy(["year", "month"]).format("delta").outputMode(
    "append"
).option("checkpointLocation", calculate_checkpoint_location(gold_path)).trigger(
    once=True
).start(
    gold_path
).awaitTermination()

Is there any way to enable liquid clustering for a delta table created like this?

 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Erik , it appears that liquid clustering needs to be enabled when first creating a table and it's not compatible with partitioning or ZORDER. In your case, you are writing your data to a Delta table using PySpark's writeStream API with partitioning.

Erik
Valued Contributor II

@KanizThanks for your reply. I know I can't use it with partitioning or zorder, thats ok. What I wonder about is if its possible to create a liquid clustering delta table using the pyspark writeStream API. For example like something like this:

silver_df.writeStream.clusterBy(["timestamp", "id"]).format("delta").outputMode(
    "append"
).option("checkpointLocation", calculate_checkpoint_location(gold_path)).trigger(
    once=True
).start(
    gold_path
).awaitTermination()

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.