implementing liquid clustering for DataFrames directly

luriveros — Tue, 19 Dec 2023 22:01:51 GMT

Hi !! I have a question is it possible to implementing liquid clustering for DataFrames directly saved to delta files (df.write.format("delta").save("path")), The conventional approach involving table creation

Re: implementing liquid clustering for DataFrames directly

brockb — Tue, 06 Feb 2024 04:10:30 GMT

Hi,
Hopefully this question is related to testing and any production data would get persisted to a table but one example is:

df = (
spark.range(10)
.write
.format("delta")
.mode("append")
.save("file:/tmp/data")
)

ALTER TABLE delta.`file:/tmp/data` CLUSTER BY (id);

DESC DETAIL delta.`file:/tmp/data`

OPTIMIZE delta.`file:/tmp/data`;

Thanks.

topic Re: implementing liquid clustering for DataFrames directly in Data Engineering

implementing liquid clustering for DataFrames directly

Re: implementing liquid clustering for DataFrames directly