10-06-2024 07:29 PM
Hi there,
I’m trying to join a small table (a few million records) with a much larger table (around 1 TB in size, containing a few billion records).
The small table isn’t quite small enough to use Broadcast. Additionally, our join clause involves more than four columns. I attempted to enable Liquid Clustering on the large table, but it only supports up to four columns. I experimented with different combinations of four-column sets for Liquid Clustering, but none of them reduced the join time.
Do you have any recommendations for optimizing a query on a table with Liquid Clustering when the join criteria involve more than four columns?
10-06-2024 11:55 PM
Hi @Erfan ,
What you can do is to create an additional column that concatenates the values of multiple columns and then apply Liquid Clustering on that new column.
10-06-2024 11:55 PM
Hi @Erfan ,
What you can do is to create an additional column that concatenates the values of multiple columns and then apply Liquid Clustering on that new column.
10-07-2024 12:02 AM
Hi @filipniziol ,
Good idea. I'll try it and will come back with the result. Thanks!
10-08-2024 06:53 PM
Unfortunatelly, since I am not the owner of the data, I am not allowed to add additional column. So I can't test it. But I guess your idead
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now