Issue with round off value while loading to delta table

DMehmuda
New Contributor

I have a float dataype column in delta table and data to be loaded should be rounded off to 2 decimal places. I'm casting the column to DECIMAL(18,10) type and then using round function from pyspark.sql.function for rounding off values to 2 decimal places. When I display the dataframe before loading into delta table, I'm getting the desired 2 decimal place values, but after loading into the table, I'm getting values for that column upto 15 decimal places. Is this expected behavior in delta table or some cluster configurations needs to be changed for the same?

Retired_mod
Esteemed Contributor III

Hi @DMehmuda, The issue arises because floating-point numbers in Delta tables can retain more decimal places than expected. To ensure values are stored with the correct precision, explicitly cast the column to `DECIMAL(18,2)` before writing to the Delta table and define the Delta table schema with the desired precision. For example, use `df.withColumn("your_column", round(col("your_column").cast("decimal(18,2)"), 2))` before saving and specify the schema with `DecimalType(18, 2)` in your Delta table schema. This helps enforce precision during storage and writing. 

Is there anything else you’d like to know or any other issues you’re facing with your setup?