cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Table Optimize Error

Dean_Lovelace
New Contributor III

I have have started getting an error message when running the following optimize command:-

deltaTable.optimize().executeCompaction()

Error:-

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Number of records changed after Optimize. NumRecordsCheckInfo(OPTIMIZE,394,1058,2554337689,2600474509,0,0,Map(predicate -> "[]", zOrderBy -> "[]", batchId -> "0", auto -> false))

What is the cause of this? It has been running fine for months.

This is with runtime 11.3 using pyspark.

3 REPLIES 3

Debayan
Databricks Employee
Databricks Employee

Hi, looks like this has to be changed in the command.

https://docs.delta.io/latest/delta-utility.html

Also, could you please recheck if

imageReference: https://docs.databricks.com/sql/language-manual/delta-optimize.html

Also, please tag @Debayanโ€‹ with your next response which will notify me. Thank you!

Anonymous
Not applicable

@Dean Lovelaceโ€‹ :

The error message suggests that the number of records in the Delta table changed after the optimize() command was run. The optimize() command is used to improve the performance of Delta tables by removing small files and compacting larger ones, which can improve query performance and reduce storage costs. However, if there are concurrent write operations happening while the optimize() command is running, it can cause the number of records to change, which can lead to this error.

To resolve this issue, you may want to consider the following steps:

  1. Check for concurrent write operations: Check if there are any other processes or jobs that are writing to the Delta table while the optimize() command is running. If there are, you may need to temporarily stop these operations to avoid conflicts.
  2. Retry the optimize() command: If you're sure there are no concurrent write operations, you can try running the optimize() command again to see if the issue persists. Sometimes, the error message may be due to a transient issue that is resolved when the command is retried.
  3. Use a different optimize() configuration: You can try using a different configuration for the optimize() command, such as increasing the minFileSize or maxFileSize parameters. This may help reduce the likelihood of conflicts with concurrent write operations.
  4. Perform a full compaction: If the issue persists, you can try running a full compaction instead of an optimized compaction. This will merge all the Delta table files into a single file, which can reduce the likelihood of conflicts with concurrent write operations. However, a full compaction can be more resource-intensive and may take longer to complete.

How can I perform a full compaction?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group