cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot cast Decimal to Double

Splush_
New Contributor II

Hey,

Im trying to save the contents of a database table to a databrick delta table. The schema right from the database returns the number fields as decimal(38, 10). At least one of the values is too large for this data type. So I try to convert it using: 

for field in df.schema.fields:
    if isinstance(field.dataType, DecimalType):
        df = df.withColumn(field.name, col(field.name).cast('double'))

When I print the schema, every decimal is indeed casted to a double.Though I always seem to get this error: 

[NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION] The 339600000000000000000000000000000000000.0000000000 rounded half up from 339600000000000000000000000000000000000 cannot be represented as Decimal(38, 10). SQLSTATE: 22003

which does not make sense at all - is there anything im missing?

Best regards

Samuel

5 REPLIES 5

VZLA
Databricks Employee
Databricks Employee

Hello, thank you for your question.

The issue you’re facing stems from a value exceeding the range allowed by Decimal(38,10) before it can be successfully cast to Double. This happens because the value is already invalid for the Decimal type, causing an overflow during Spark's processing.

Suggested Solution

  1. Convert to String First: By converting the Decimal column to String before casting to Double, you can avoid the overflow issue. Once converted to a string, Spark can then interpret the value as a Double.

  2. Use Conditional Replacement: Safeguard against uncastable values by using a condition to replace problematic values with null or a default fallback. This avoids runtime errors while maintaining data integrity.

  3. Filter Out Extreme Values: Before performing the cast, filter rows with values that exceed the allowable range for Decimal(38,10). This ensures only valid values are processed.

  4. Inspect the Data: Examine the range of values in the source table to confirm the scope of the issue. This can guide decisions on whether filtering or another approach is most appropriate.

Hope it helps!

Example:

 

from pyspark.sql.functions import col, when
from pyspark.sql.types import DecimalType

# Iterate through schema fields and process DecimalType columns
for field in df.schema.fields:
    if isinstance(field.dataType, DecimalType):
        # Filter out extreme values
        df = df.filter(col(field.name) <= 10**38)
        # Handle invalid values by first casting to String and then to Double
        df = df.withColumn(
            field.name,
            when(
                col(field.name).cast("string").cast("double").isNotNull(),
                col(field.name).cast("string").cast("double")
            ).otherwise(None)
        )

We can still combine the filtering and transformation into a single pass to reduce computation:

from pyspark.sql.functions import col, when
from pyspark.sql.types import DecimalType

# Process DecimalType columns in one step
for field in df.schema.fields:
    if isinstance(field.dataType, DecimalType):
        df = df.withColumn(
            field.name,
            when(
                (col(field.name) <= 10**38) & col(field.name).cast("string").cast("double").isNotNull(),
                col(field.name).cast("string").cast("double")
            ).otherwise(None)  # Replace invalid or out-of-range values with null
        )

 

Splush_
New Contributor II

Hey VZLA,

thank you so much for your answer and help!

Sadly, this code did not resolve my issue. First I have changed your code to put a Decimal around the 10**38 to make it work.

Executing this did indeed change the schema. But after running

df.where(col('PART_NR') == '6016-HP SIEGEL      ').show()

I get the exact same error after around 6 minutes of this thing running 😞

Hi @Splush_ ,

What is the source of the data? How you end up with dataframe with decimal(38,10) field that exceeds this number?
Do you have a chance to have this field as string?

The below code starts with string values and successfully converts to double:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, DecimalType
from pyspark.sql.functions import col, when, isnan, isnull

# Define schema with String type for large values
schema = StructType([
    StructField("id", StringType(), True),
    StructField("value_str", StringType(), True)  # Store as string to avoid immediate precision errors
])

# Test data with values that exceed Decimal(38,10)
data = [
    ("valid_1", "123.4567890123"),  # Normal
    ("valid_2", "9876543210.1234567890"),  # Large but valid
    ("invalid_1", "9" * 38 + ".9999999999"),  # 99999999999999999999999999999999999999.9999999999 (Exceeds Decimal(38,10))
    ("invalid_2", "1E40"),  # Exceeds max precision
    ("invalid_3", "-1E40"),  # Negative overflow
    ("null_case", None)  # Null case
]

# Create DataFrame
df = spark.createDataFrame(data, schema=schema)

# Show initial data
display(df)

df_fixed = df.withColumn(
    "decimal_value",
    when(
        ~isnan(col("value_str").cast("double")),  # Ensure numeric format
        col("value_str").cast("double")
    ).otherwise(None)  # Replace invalid values with null
)

# Show processed data
display(df_fixed)

The result:

filipniziol_0-1738181588844.png

 




Splush_
New Contributor II

Hey @filipniziol,

Thank you so much for your response and the code snipped you provided! You are right - it should not be possible that this number is inside of this DataFrame! The problem seems to be the Spark JDBC connector. The data source is a Oracle Database. The column in question is of type float with a scale of 127 and a prescision of 126.

However Spark translates this column directly to a Decimal(38, 10) Type and automatically infers this from the source. Once it wants to cast the type, apparently it fails due to the number oviously being too large for the type it is supposed to be.

I have found out, that this float type is stored in the database as type "number" (Data Types). So it is apparently default behavior for Spark to translate this into a Decimal(38,10) (Solved: Re: Spark error - Decimal precision exceeds max pr... - Cloudera Community - 392840).

Sadly this info doesnt really help to solve my problem - Im just unable to change it to anything else. Maybe there is an option t

Best regards

Samuel

Splush_
New Contributor II

Hey guys,

Thank you a lot for your help. Since this is taking days alreary, I have asked the application owners of the database to delete these values for me. Apparently they are weights in gram for whatever products - so the problematic rows are heavier than the weight of the whole solar system 😄

Since the rows are now removed I cannot test any further.

Best regards Samuel

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group