cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoloader: Read old version of file. Read modification time is X, latest modification time is X

stevenayers-bge
Contributor

I'm recieving this error from autoloader. It seems to be stuck on this one file. I don't care when it was read and last modified, I just want to ingest it. Any ideas?

java.io.IOException: Read old version of file s3a://<file-path>.json. Read modification time is 1713910814000, latest modification time is 1713925112000

at com.databricks.sql.io.StalenessChecker$Impl.check(StalenessChecker.java:223) at com.databricks.photon.NativeIOBroker.lambda$new$0(NativeIOBroker.java:374)

at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:173)

at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:173)

at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:228)

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @stevenayers-bge, The error message indicates that the file you’re trying to read is an old version, and there’s a discrepancy between the read modification time and the latest modification time.

Let’s explore some potential solutions based on similar issues reported by other developers:

  1. Check Dependencies:

    • Ensure that you have the correct dependencies for reading files from S3. The s3a:// client is recommended over the older s3n:// client. Make sure you’re using the right configuration.
    • If you’re using Spark, add the necessary dependencies to your Spark job. For example, if you’re submitting your job using spark-submit, include the required S3-related dependencies.
    • If you’re using JupyterHub, make sure the dependencies are distributed to the entire cluster. You can specify dependencies in the kernel.json file located in /usr/local/share/jupyter/kernels/pyspark/kernel.json.
  2. Update Hadoop Version:

  3. Check File Versions:

    • Verify that the file you’re trying to read is indeed the latest version. Sometimes, older versions can cause issues.
    • If possible, check the file directly in the S3 bucket to ensure it matches the expected version.

Hopefully, one of these approaches will help you resolve the issue! 😊

If you need further assistance or have additional details, feel free to share them, and I’ll be happy to assist! 🚀

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group