cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

RakeshRakesh_De
New Contributor III

Hi,

I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3

I have tried all possible way but its not working

 

display(spark.read.option("emptyValue", "").csv('/FileStore/tables/test2.csv',header=True,inferSchema=True))
display(spark.read.option("emptyValue","None").csv('/FileStore/tables/test2.csv',header=True,inferSchema=True))
spark.read.option("nullValue", "None").csv('/FileStore/tables/test2.csv',header=True,inferSchema=False)
 
 Sample file below as input csv
 RakeshRakesh_De_0-1713431921922.png

 

7 REPLIES 7

-werners-
Esteemed Contributor III

May I ask why you do not want null?  It is THE way to indicate a value is missing (and gives you filtering possibilities using isNull/isNotNull).

RakeshRakesh_De
New Contributor III

Hi @-werners- ,  User wants data in landing table like this only, they have some data like None as well... And can have some case when statement based on blank value and null value in next layer

-werners-
Esteemed Contributor III

.option("nullValue", "") should do the trick.

Riyakh
New Contributor II

.option(nullValue, "")
empty strings are interpreted as null values by default. If you set nullValue to anything but "", like "null" or "none", empty strings will be read as empty strings and not as null values anymore.

Please check-
dataframe - Read spark csv with empty values without converting to null - Stack Overflow

 

RakeshRakesh_De
New Contributor III

dont quote something from stackoverflow because those are old version in spark tried.. have you tried the thing on your own to verify if this really working or not in spark3??

-werners-
Esteemed Contributor III

afaik nullValue, "" should do the trick.  But I tested myself on your example and indeed it does not work.
Gonna do some checking...

-werners-
Esteemed Contributor III

OK, after some tests:
The trick is in surrounding text in your csv with quotes.  Like that spark can actually make a difference between a missing value and an empty value.  Missing values are null and can only be converted to something else implicitely (by using coalesce f.e.).
When a column contains '', nullvalue = "''" will create an empty value and not null.
The same for emptyValue if you want.
Not sure if it is workable for you though.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group