cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Uploading Data into Databricks from CSV - Puts Amount column as a string

Ethn
New Contributor

I am attempting in ingesting data into databricks via CSV, with the following statement below, this brings in my data looking perfect:

Ethn_1-1696445292620.png

Although, the bad part is I have to group this data by summing highlighted amt field. Given it is string it spits out all kinds of nulls/0. I then try to change to integer I get the following, it seems as if any values with commas it nulls out (the sum should be displaying the amount of qty * UP): 

Ethn_4-1696445675049.png

What statements would be helpful in fixing my issue of being able to change this from string to integer without messing up any values?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @EthnThe issue you are facing is likely because the field you are trying to convert to an integer contains commas. In many locales, commas are used as a thousand separator. So, when you try to convert a string with a comma to an integer, it will result in null values because it is not a valid character in an integer. 

You can solve this problem by replacing the commas in your data with nothing, effectively removing them. Then, you can convert the resulting string to an integer. In PySpark, you can use the withColumn function to create a new column based on an existing one, with transformations applied. You can use this function together with the expr function to replace the commas and convert the string to an integer.