We are encountering an issue in our Spark streaming pipeline when attempting to write checkpoint data to S3. The error we are seeing is as follows:
25/08/12 13:35:40 ERROR RocksDBFileManager : Error zipping to s3://xxx-datalake-binary/event-types/checkpoint/eventlog.e682ad_batch_pool_generator_input_pool_reduced/sources/0/rocksdb/0.zip
java.nio.file.FileAlreadyExistsException: s3://xxx-datalake-binary/event-types/checkpoint/eventlog.e682ad_batch_pool_generator_input_pool_reduced/sources/0/rocksdb/0.zip
25/08/12 13:35:40 ERROR RocksDBFileManager : Error zipping to s3://xxx-datalake-binary/event-types/checkpoint/eventlog.e682ad_batch_pool_generator_input_pool_reduced/sources/0/rocksdb/0.zip
java.nio.file.FileAlreadyExistsException: s3://xxx-datalake-binary/event-types/checkpoint/eventlog.e682ad_batch_pool_generator_input_pool_reduced/sources/0/rocksdb/0.zip
It seems that the file already exists at the destination path in S3 (0.zip), causing the job to fail. This issue is hindering our checkpointing process during streaming.
Could you please advise on how to best address this issue or suggest any best practices for handling such file conflicts in streaming jobs? Additionally, if there are any configuration adjustments or features that could help us mitigate this error, that would be greatly appreciated.
Looking forward to your guidance.