Hi @LiamS, The performance of your query can be affected by several factors, including the data types of the columns you're joining. Using string data type for the join keys might be hindering the performance. Converting the keys to the integer could improve performance as integer comparisons are generally faster than string comparisons.
Delta Lake provides several optimizations that can help improve the performance of your queries, including:-
- Data integrity constraints: These enforce data consistency and prevent accidental writes with incorrect or invalid schemas.
- Schema evolution: This allows for controlled schema evolution when business needs change.
- Transactional Guarantees: Writes to delta tables are ACID compliant, ensuring data consistency.
- Upserts: This feature updates data if it exists. Otherwise, it inserts new data.
However, as you mentioned, Z-ordering did not improve the performance in your case.
This might be because Z-ordering is most effective when the data is filtered on the Z-ordered columns, which might not be the case in your scenario. In conclusion, you might want to consider changing the data type of your join keys from string to integer to improve the performance of your queries.