02-07-2023 06:29 AM
I want to define a column with null values in my dataframe using pyspark. This column will later be used for other calculations.
What is the difference between creating it in these two different ways?
Can both be used? Is there a wrong one?
Thank you so much!
02-07-2023 08:06 AM
Hello!
An elegant way of defining an empty column in a dataframe is to mention as
df.withColumn("New_Column", lit(None).cast(StringType()))
If you are just working with dataframes ( and no file formats are involved) you can also work with NullType().
02-07-2023 08:06 AM
Hello!
An elegant way of defining an empty column in a dataframe is to mention as
df.withColumn("New_Column", lit(None).cast(StringType()))
If you are just working with dataframes ( and no file formats are involved) you can also work with NullType().
04-08-2023 12:22 AM
Hi @Sara Corral
Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.
Please help us select the best solution by clicking on "Select As Best" if it does.
Your feedback will help us ensure that we are providing the best possible service to you. Thank you!
01-31-2025 04:33 AM
For me
df.withColumn("New_Column", lit(None).cast(StringType()))
this didn't work.
I used this instead
df.withColumn("New_Column", lit(null).cast(StringType))
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group