Unable to infer schema for Parquet at

bciampa
New Contributor II

I have this code in a notebook:

val streamingDataFrame = incomingStream.selectExpr("cast (body as string) AS Content") .withColumn("Sentiment", toSentiment($"Content"))

import org.apache.spark.sql.streaming.Trigger.ProcessingTime val result = streamingDataFrame .writeStream.format("parquet") .option("path", "/mnt/TwitterSentiment") .option("checkpointLocation", "/mnt/temp/check") .start() </p><p>...and it always results in this error. Am stumped, any advice?</p><pre>org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet at . It must be specified manually;<br>

-werners-
Esteemed Contributor III

seems like an invalid parquet file. my guess is the incoming data has mixed types (for the same column) or a different/invalid structure.