- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-07-2023 06:29 AM
I want to define a column with null values in my dataframe using pyspark. This column will later be used for other calculations.
What is the difference between creating it in these two different ways?
- df.withColumn("New_Column", lit(None))
- df.withColumn("New_Column", lit(None).cast('string'))
Can both be used? Is there a wrong one?
Thank you so much!
- Labels:
-
Cast
-
New Column
-
Null Values
-
String
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-07-2023 08:06 AM
Hello!
An elegant way of defining an empty column in a dataframe is to mention as
df.withColumn("New_Column", lit(None).cast(StringType()))
If you are just working with dataframes ( and no file formats are involved) you can also work with NullType().
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-07-2023 08:06 AM
Hello!
An elegant way of defining an empty column in a dataframe is to mention as
df.withColumn("New_Column", lit(None).cast(StringType()))
If you are just working with dataframes ( and no file formats are involved) you can also work with NullType().
![](/skins/images/F150478535D6FB5A5FF0311D4528FC89/responsive_peak/images/icon_anonymous_profile.png)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2023 12:22 AM
Hi @Sara Corral
Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.
Please help us select the best solution by clicking on "Select As Best" if it does.
Your feedback will help us ensure that we are providing the best possible service to you. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
For me
df.withColumn("New_Column", lit(None).cast(StringType()))
this didn't work.
I used this instead
df.withColumn("New_Column", lit(null).cast(StringType))
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)