cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

RakeshRakesh_De
New Contributor III

Hi,

I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3

I have tried all possible way but its not working

 

display(spark.read.option("emptyValue", "").csv('/FileStore/tables/test2.csv',header=True,inferSchema=True))
display(spark.read.option("emptyValue","None").csv('/FileStore/tables/test2.csv',header=True,inferSchema=True))
spark.read.option("nullValue", "None").csv('/FileStore/tables/test2.csv',header=True,inferSchema=False)
 
 Sample file below as input csv
 RakeshRakesh_De_0-1713431921922.png

 

7 REPLIES 7

-werners-
Esteemed Contributor III

May I ask why you do not want null?  It is THE way to indicate a value is missing (and gives you filtering possibilities using isNull/isNotNull).

RakeshRakesh_De
New Contributor III

Hi @-werners- ,  User wants data in landing table like this only, they have some data like None as well... And can have some case when statement based on blank value and null value in next layer

-werners-
Esteemed Contributor III

.option("nullValue", "") should do the trick.

Riyakh
New Contributor II

.option(nullValue, "")
empty strings are interpreted as null values by default. If you set nullValue to anything but "", like "null" or "none", empty strings will be read as empty strings and not as null values anymore.

Please check-
dataframe - Read spark csv with empty values without converting to null - Stack Overflow

 

RakeshRakesh_De
New Contributor III

dont quote something from stackoverflow because those are old version in spark tried.. have you tried the thing on your own to verify if this really working or not in spark3??

-werners-
Esteemed Contributor III

afaik nullValue, "" should do the trick.  But I tested myself on your example and indeed it does not work.
Gonna do some checking...

-werners-
Esteemed Contributor III

OK, after some tests:
The trick is in surrounding text in your csv with quotes.  Like that spark can actually make a difference between a missing value and an empty value.  Missing values are null and can only be converted to something else implicitely (by using coalesce f.e.).
When a column contains '', nullvalue = "''" will create an empty value and not null.
The same for emptyValue if you want.
Not sure if it is workable for you though.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!