Databricks Community

SIRIGIRI · ‎12-31-2022

Spark Dataframes Schema

Schema inference is not reliable.

We have the following problems in schema inference:

Automatic inferring of schema is often incorrect
Inferring schema is additional work for Spark, and it takes some extra time
Schema inference is conflicting with the schema validation

4. It might also change the column order

We have two approaches to do it.

Schema DDL String
Struct Type Object

Further Detailed description please refer this link

https://sharikrishna26.medium.com/spark-dataframes-schema-6fe1f90a56c

Please like,share,comment

Happy New year 2023

Rishabh-Pandey · ‎12-31-2022

Thanks for sharing

Rishabh Pandey

Aviral-Bhardwaj · ‎01-01-2023

good post thanks

AviralBhardwaj

Varshith · ‎01-01-2023

one other difference between those 2 approaches is that In Schema DDL String approach we use STRING, INT etc.. But In Struct Type Object approach we can only use Spark datatypes such as StringType(), IntegerType(), etc..

Databricks Community

sharikrishna26.medium.com

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April