Databricks Community

The_Demigorgan · ‎11-21-2023

I'm trying to ingest data from Parquet files using Autoloader. Now, I have my custom schema, I don't want to infer the schema from the parquet files.

During readstream everything is fine. But during writestream, it is somehow inferring the schema from the files and I'm getting a schema mismatch error.

Any idea why it is happening? Help will be appreciated.

#

cgrant · 3 weeks ago

In this case, please make sure you specify the schema explicitly when reading the Parquet files and do not specify any inference options.

Something like

spark.readStream.format("cloudFiles").schema(schema)...

If you want to more easily grab the schema, you can read with the batch reader and capture the schema:

schema = spark.read.parquet("/your/path/here").schema

Databricks Community

Autoloader issue

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon