Sunday
I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this:
silver_df.writeStream.partitionBy(["year", "month"]).format("delta").outputMode(
"append"
).option("checkpointLocation", calculate_checkpoint_location(gold_path)).trigger(
once=True
).start(
gold_path
).awaitTermination()
Is there any way to enable liquid clustering for a delta table created like this?
Monday
Hi @Erik , it appears that liquid clustering needs to be enabled when first creating a table and it's not compatible with partitioning or ZORDER. In your case, you are writing your data to a Delta table using PySpark's writeStream API with partitioning.
Tuesday
@KanizThanks for your reply. I know I can't use it with partitioning or zorder, thats ok. What I wonder about is if its possible to create a liquid clustering delta table using the pyspark writeStream API. For example like something like this:
silver_df.writeStream.clusterBy(["timestamp", "id"]).format("delta").outputMode(
"append"
).option("checkpointLocation", calculate_checkpoint_location(gold_path)).trigger(
once=True
).start(
gold_path
).awaitTermination()
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.