Spark Handling White Space as NULL

Chrispy — Mon, 08 Apr 2024 13:41:57 GMT

I have a very strange thing happening. I'm importing a csv file and nulls and blanks are being interpreted correctly. What is strange is that a column that regularly has a single space character value is having the single space converted to null.

I'm using this to import the file data:

df = spark.read.format("csv").options(mode='FAILFAST', multiLine=True, escape='"').csv(path=source_path, header=True, inferSchema=False).select("*", "_metadata.file_name").withColumns({"date1": lit(currentdt), "GUID": lit(guid)})

I've tried including option nullValue=" " but then all blanks are being imported as blank.

I'm new to this and sure I'm doing something wrong.

topic Spark Handling White Space as NULL in Get Started Discussions

Spark Handling White Space as NULL