cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Cast string to decimal

_Raju
New Contributor II

Hello, can anyone help me with the below error.

I'm trying to cast the string column into decimal. When I try to do that I'm getting the "Py4JJavaError: An error occurred while calling t.addCustomDisplayData. : java.sql.SQLException: Status of query associated with resultSet is FAILED_WITH_ERROR. Numeric value 'WVL383' is not recognized Results not generated." error.

The dataframe and error details are attached.

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @_RajuThe error message you provided indicates that there’s an issue with the value 'WVL383', which is not recognized as a valid numeric value.

Here are some steps to troubleshoot and resolve this issue:

  • First, examine the data in the column you’re trying to cast. Look for any unexpected values or non-numeric entries.
  • Specifically, check if there are any non-numeric characters (such as letters or symbols) in the column.
  • If you find non-numeric values, consider cleaning the data. You can filter out rows with invalid entries or replace them with appropriate numeric values.
  • For example, if 'WVL383' is an outlier, you might want to handle it separately (e.g., impute with a default value or remove the row).
  • Ensure that your conversion logic is correct. When casting a string column to decimal, make sure that all values in the column can be successfully converted.
  • Double-check the casting operation you’re using (e.g., cast("decimal") or cast("double")).
  • The error message also mentions a version mismatch between Python in the worker and the driver. Ensure that both Python versions match.
  • Verify that the environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set to the same Python version.
  • Here’s an example of how you can cast a string column to decimal in PySpark:

    from pyspark.sql import SparkSession
    from pyspark.sql.functions import col
    
    # Create a Spark session
    spark = SparkSession.builder.appName("DataFrame").getOrCreate()
    
    # Sample DataFrame with a string column
    data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
    df = spark.createDataFrame(data, ["language", "salary"])
    
    # Cast the salary column to decimal
    df = df.withColumn("salary_decimal", col("salary").cast("decimal"))
    
    # Show the resulting DataFrame
    df.show()
    
  • If you encounter any further issues, feel free to share additional details or code snippets, and we’ll continue troubleshooting! 😊
  1.  

_Raju
New Contributor II

Hi Kaniz,

Thank you for your response.

I',m expecting that the new column should be populated as NULL after the decimal conversion, if it has any characters as below example. But it is not happening when I use the same casting in my original file.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.