Hi @leungi,
Troubleshooting the ANALYZE TABLE Error
Based on the information provided, it seems that you are encountering an issue when trying to run the `ANALYZE TABLE` command to compute statistics for specific columns in your table. The error message indicates an internal error in Spark or the Spark plugins you are using.
Here are some steps you can take to troubleshoot and resolve this issue:
1. Check the Column Statistics
The error message suggests that one of the columns you are trying to collect statistics for (`my_col_1` or `my_col_2`) does not have any statistics collected. This could be the root cause of the issue.
You can check the current statistics for your table by running the following query:
SHOW COLUMN STATS <catalog>.<schema>.<table>;
This will display the current statistics for each column in the table, including the number of distinct values, null count, and other relevant information.
2. Identify the Problematic Column
If the output of the `SHOW COLUMN STATS` query shows that one of the columns you are trying to analyze does not have any statistics collected, this is likely the cause of the error.
3. Collect Statistics for the Problematic Column
To resolve the issue, you can try collecting statistics for the problematic column separately, without including the other columns in the `ANALYZE TABLE` command. This may help bypass the error and allow you to collect the necessary statistics.
Run the following command, replacing `my_col_1` with the name of the column that does not have statistics:
ANALYZE TABLE <catalog>.<schema>.<table> COMPUTE STATISTICS FOR COLUMN my_col_1;
If this command is successful, you can then try running the original `ANALYZE TABLE` command again, including all the columns you need.
4. Optimize the Table
Once you have successfully collected the necessary statistics, you can proceed with the `OPTIMIZE` and `ZORDER` operations on your table.
Optimizing the Table
To optimize the table, you can use the `OPTIMIZE` command:
OPTIMIZE TABLE <catalog>.<schema>.<table>;
This will perform various optimization tasks, such as compacting small files, removing deleted data, and updating statistics.
Applying ZORDER
After optimizing the table, you can apply the `ZORDER` operation to the table:
ALTER TABLE <catalog>.<schema>.<table> ZORDER BY my_col_1, my_col_2;
This will sort the data in the table based on the specified columns, which can improve query performance for certain types of queries.