cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SELECT issue after an OPTIMIZE operation

alxsbn
New Contributor III

I have a strange issue after an OPTIMIZE, no results are returned anymore.

I can time travel over the version easily but passed this data nothing when I'm doing a simple SELECT *.

But I still got a result when I'm doing a SELECT count(*).

How is this possible?  

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @alxsbnAfter performing an OPTIMIZE operation, youโ€™ve encountered an interesting situation where no results are returned when you execute a simple SELECT * query, but you still get results when you run SELECT COUNT(*).

Letโ€™s explore this further:

  1. SELECT * vs. SELECT COUNT(*):

    • SELECT * retrieves all columns and rows from the specified table.
    • SELECT COUNT(*) calculates the total number of rows in the table without retrieving the actual data.
  2. Possible Explanations:

    • Index Usage: When you perform SELECT COUNT(*), the database engine can use an index (if available) to quickly determine the row count. However, for SELECT *, it needs to retrieve the actual data, which might involve a full table scan.
    • Data Retrieval Cost: Fetching all columns for every row can be expensive in terms of I/O and memory. If thereโ€™s a large amount of data, it could lead to performance issues.
    • Filter Conditions: Check if there are any filter conditions (e.g., WHERE clauses) in your SELECT * query. These conditions might exclude all rows after the OPTIMIZE operation.
    • Data Corruption: Rarely, an OPTIMIZE operation could cause data corruption or inconsistencies. Verify the integrity of your data.
  3. Optimization Strategies:

    • Index Optimization: Ensure that relevant columns have appropriate indexes. Indexes can significantly improve query performance.
    • Analyze Execution Plans: Examine the execution plans for both queries. This will help identify bottlenecks and potential areas for optimization.
    • Statistics Update: Update table statistics to help the query optimizer make better decisions.
    • Check for Locks: Ensure there are no locks preventing data retrieval.
  4. Next Steps:

    • Review your query execution plans and analyze any differences between the two queries.
    • Consider using specific column names instead of SELECT * to retrieve only necessary data.
    • Verify the integrity of your data after the OPTIMIZE operation.
 

View solution in original post

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @alxsbnAfter performing an OPTIMIZE operation, youโ€™ve encountered an interesting situation where no results are returned when you execute a simple SELECT * query, but you still get results when you run SELECT COUNT(*).

Letโ€™s explore this further:

  1. SELECT * vs. SELECT COUNT(*):

    • SELECT * retrieves all columns and rows from the specified table.
    • SELECT COUNT(*) calculates the total number of rows in the table without retrieving the actual data.
  2. Possible Explanations:

    • Index Usage: When you perform SELECT COUNT(*), the database engine can use an index (if available) to quickly determine the row count. However, for SELECT *, it needs to retrieve the actual data, which might involve a full table scan.
    • Data Retrieval Cost: Fetching all columns for every row can be expensive in terms of I/O and memory. If thereโ€™s a large amount of data, it could lead to performance issues.
    • Filter Conditions: Check if there are any filter conditions (e.g., WHERE clauses) in your SELECT * query. These conditions might exclude all rows after the OPTIMIZE operation.
    • Data Corruption: Rarely, an OPTIMIZE operation could cause data corruption or inconsistencies. Verify the integrity of your data.
  3. Optimization Strategies:

    • Index Optimization: Ensure that relevant columns have appropriate indexes. Indexes can significantly improve query performance.
    • Analyze Execution Plans: Examine the execution plans for both queries. This will help identify bottlenecks and potential areas for optimization.
    • Statistics Update: Update table statistics to help the query optimizer make better decisions.
    • Check for Locks: Ensure there are no locks preventing data retrieval.
  4. Next Steps:

    • Review your query execution plans and analyze any differences between the two queries.
    • Consider using specific column names instead of SELECT * to retrieve only necessary data.
    • Verify the integrity of your data after the OPTIMIZE operation.
 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.