Databricks Community

ClarkElliott · ‎03-18-2025

I am having an issue with parquet files:

I'm getting Illegal Parquet type: INT64 (TIMESTAMP(NANOS,false)) error while trying to read a parquet file (generated outside of DataBricks).

I am using a Delta streaming live table with a pipeline. If I remove the one file the pipeline works fine. I can use Pandas in Python can open this file just fine. Any ideas on how to address this with a pipeline based on a notebook with a single cell create streaming live table .... I've seen I can add this: spark.conf.set("spark.sql.legacy.parquet.nanosAsLong", "true") If using spark directly. Not sure on how to affect the abstraction of this loading using the delta streaming live table.

Any help appreciated

Saritha_S · ‎06-11-2025

Hi @ClarkElliott

Good day!!

Cause

Databricks Runtime versions 11.3 LTS and above do not support the TIMESTAMP_NANOS type in open source Apache Spark and Databricks Runtime. If a Parquet file contains fields with the TIMESTAMP_NANOS type, attempts to read it will fail with an Illegal Parquet Type exception. As a result, schema inference will also fail, since Spark cannot interpret the unsupported timestamp type.

To restore the behavior before Spark 3.2, you can set spark.sql.legacy.parquet.nanosAsLong to true.
Reference: https://spark.apache.org/docs/4.0.0/sql-migration-guide.html#upgrading-from-spark-sql-31-to-32:~:tex....

You can add the below configuration to the DLT pipeline settings.

spark.sql.legacy.parquet.nanosAsLong true

Kindly let me know if you have any questions on this.