Databricks Community

stevenayers-bge · ‎04-26-2024

I'm recieving this error from autoloader. It seems to be stuck on this one file. I don't care when it was read and last modified, I just want to ingest it. Any ideas?

java.io.IOException: Read old version of file s3a://<file-path>.json. Read modification time is 1713910814000, latest modification time is 1713925112000

at com.databricks.sql.io.StalenessChecker$Impl.check(StalenessChecker.java:223) at com.databricks.photon.NativeIOBroker.lambda$new$0(NativeIOBroker.java:374)

at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:173)

at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:173)

at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:228)

Kaniz_Fatma · ‎05-02-2024

Hi @stevenayers-bge, The error message indicates that the file you’re trying to read is an old version, and there’s a discrepancy between the read modification time and the latest modification time.

Let’s explore some potential solutions based on similar issues reported by other developers:

Check Dependencies:
- Ensure that you have the correct dependencies for reading files from S3. The s3a:// client is recommended over the older s3n:// client. Make sure you’re using the right configuration.
- If you’re using Spark, add the necessary dependencies to your Spark job. For example, if you’re submitting your job using spark-submit, include the required S3-related dependencies.
- If you’re using JupyterHub, make sure the dependencies are distributed to the entire cluster. You can specify dependencies in the kernel.json file located in /usr/local/share/jupyter/kernels/pyspark/kernel.json.
Update Hadoop Version:
- The error message mentions the s3n:// client, which is no longer recommended. Consider migrating to the s3a:// client.
- Update your Hadoop version to a more recent one. The current version as of August 2020 is 3.3.0 ¹.
Check File Versions:
- Verify that the file you’re trying to read is indeed the latest version. Sometimes, older versions can cause issues.
- If possible, check the file directly in the S3 bucket to ensure it matches the expected version.