cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

RakeshRakesh_De
New Contributor III

Hi,

I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3

I have tried all possible way but its not working

 

display(spark.read.option("emptyValue", "").csv('/FileStore/tables/test2.csv',header=True,inferSchema=True))
display(spark.read.option("emptyValue","None").csv('/FileStore/tables/test2.csv',header=True,inferSchema=True))
spark.read.option("nullValue", "None").csv('/FileStore/tables/test2.csv',header=True,inferSchema=False)
 
 Sample file below as input csv
 RakeshRakesh_De_0-1713431921922.png

 

7 REPLIES 7

-werners-
Esteemed Contributor III

May I ask why you do not want null?  It is THE way to indicate a value is missing (and gives you filtering possibilities using isNull/isNotNull).

RakeshRakesh_De
New Contributor III

Hi @-werners- ,  User wants data in landing table like this only, they have some data like None as well... And can have some case when statement based on blank value and null value in next layer

-werners-
Esteemed Contributor III

.option("nullValue", "") should do the trick.

Riyakh
New Contributor II

.option(nullValue, "")
empty strings are interpreted as null values by default. If you set nullValue to anything but "", like "null" or "none", empty strings will be read as empty strings and not as null values anymore.

Please check-
dataframe - Read spark csv with empty values without converting to null - Stack Overflow

 

RakeshRakesh_De
New Contributor III

dont quote something from stackoverflow because those are old version in spark tried.. have you tried the thing on your own to verify if this really working or not in spark3??

-werners-
Esteemed Contributor III

afaik nullValue, "" should do the trick.  But I tested myself on your example and indeed it does not work.
Gonna do some checking...

-werners-
Esteemed Contributor III

OK, after some tests:
The trick is in surrounding text in your csv with quotes.  Like that spark can actually make a difference between a missing value and an empty value.  Missing values are null and can only be converted to something else implicitely (by using coalesce f.e.).
When a column contains '', nullvalue = "''" will create an empty value and not null.
The same for emptyValue if you want.
Not sure if it is workable for you though.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.