df.printSchema()
root
|-- Device_ID: string (nullable = true)
|-- Location: string (nullable = true)
|-- Latitude: double (nullable = true)
|-- Longitude: double (nullable = true)
|-- DateTime: timestamp (nullable = true)
|-- Rainfall_Value: double (nullable = true)
|-- year: integer (nullable = true)
|-- month: integer (nullable = true)
|-- day: integer (nullable = true)
|-- hour: integer (nullable = true)
|-- minute: integer (nullable = true)
df.write.partitionBy("year","month").mode("overwrite").parquet("/home/rainfall/parquet/rainfall.parquet")
org.apache.spark.SparkException: Job aborted due to stage failure: Task 29 in stage 675.0 failed 1 times, most recent failure: Lost task 29.0 in stage 675.0 (TID 5311) (ip-10-175-235-230.ap-southeast-1.compute.internal executor driver): com.databricks.sql.io.FileReadException: Error while reading file dbfs:REDACTED_LOCAL_PART@xyz**.com.sg/weather123-lakehouse/delta/2022-10-13.parquet/part-00003-tid-7527434428502281281-b966b165-5e61-4ba0-a6ca-cea51e5acdf2-3762-1-c000.snappy.parquet. Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64
Since the above schema already shows Rainfall_Value to be of DoubleType, why does it complain it found INT64 type? I am lost how to debug for this.
Thanks in advance.