sharikrishna26.medium.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-31-2022 05:38 AM
Spark Dataframes Schema
Schema inference is not reliable.
We have the following problems in schema inference:
- Automatic inferring of schema is often incorrect
- Inferring schema is additional work for Spark, and it takes some extra time
- Schema inference is conflicting with the schema validation
4. It might also change the column order
We have two approaches to do it.
- Schema DDL String
- Struct Type Object
Further Detailed description please refer this link
https://sharikrishna26.medium.com/spark-dataframes-schema-6fe1f90a56c
Please like,share,comment
Happy New year 2023
- Labels:
-
Dataframe
-
Dataframes
-
Schema
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-31-2022 08:26 AM
Thanks for sharing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-01-2023 12:09 AM
good post thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-01-2023 07:05 PM
one other difference between those 2 approaches is that In Schema DDL String approach we use STRING, INT etc.. But In Struct Type Object approach we can only use Spark datatypes such as StringType(), IntegerType(), etc..
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)