sharikrishna26.medium.com

SIRIGIRI
Databricks Partner

Spark Dataframes Schema

Schema inference is not reliable.

We have the following problems in schema inference:

  1. Automatic inferring of schema is often incorrect
  2. Inferring schema is additional work for Spark, and it takes some extra time
  3. Schema inference is conflicting with the schema validation

4. It might also change the column order

We have two approaches to do it.

  1. Schema DDL String
  2. Struct Type Object

Further Detailed description please refer this link

https://sharikrishna26.medium.com/spark-dataframes-schema-6fe1f90a56c

Please like,share,comment

Happy New year 2023

Rishabh-Pandey
Databricks MVP

Thanks for sharing

Rishabh Pandey

Aviral-Bhardwaj
Esteemed Contributor III

good post thanks

AviralBhardwaj

Varshith
New Contributor III

one other difference between those 2 approaches is that In Schema DDL String approach we use STRING, INT etc.. But In Struct Type Object approach we can only use Spark datatypes such as StringType(), IntegerType(), etc..