topic Re: Pyspark cast error in Get Started Discussions

Pyspark cast error

anandreddy23 — Wed, 16 Aug 2023 13:12:04 GMT

Hi All,

hive> create table UK ( a decimal(10,2)) ;

hive> create table IN ( a decimal(10,5)) ;

hive> create view T as select a from UK union all select a from IN ;

above all statements executes successfully in Hive and return results when select statement is executed. However, when select statement executed from python using pyspark I get error saying "Cannot up cast a from decimal(10,2) to decimal(10,5)".

Ideally view looks for same datatype and also this work fine in its source data db(hive). This has become a show stopper and Cculd you please help me with a possible solution to fix this in pyspark please ?

Thanks in advance,

Anand.

#pyspark

Re: Pyspark cast error

anandreddy23 — Wed, 16 Aug 2023 15:03:11 GMT

Hi @Retired_mod , Many thanks for your quick response.

I am sorry as I can't change the datatype or match the decimals. However, my doubt is again Hive is the DB where the view is created and queries without any errors, why should spark have to look into scale & precision if datatypes match ? We were told spark is framework that speeds up reading and processing data using multiple nodes within cluster, but wasn't aware that it would use its own SQL execution plan and its rules are different from underlying database

Re: Pyspark cast error

NandiniN — Sat, 01 Feb 2025 07:14:57 GMT

Spark SQL enforces stricter type casting rules compared to Hive, which is why you are encountering the "Cannot up cast a from decimal(10,2) to decimal(10,5)" error in PySpark. While Hive allows combining columns with different decimal scales in a union operation without issue, Spark SQL requires the scales to match exactly.

To resolve this issue in PySpark, you can explicitly cast the columns to a common decimal type before performing the union operation.

However you can try to temporarily disabling ANSI mode can help bypass strict type casting rules.

spark.conf.set("spark.sql.ansi.enabled", "false")

Re: Pyspark cast error

anandreddy23 — Wed, 05 Feb 2025 12:21:20 GMT

Hi Nandini,

Thanks for sharing the above solution. To be sure my understanding is correct, could you confirm below please ?

hive> create table test.UK ( a decimal(10,2)) ;

hive> create table test.IN ( a decimal(10,5)) ;

hive> create view test.T as select a from UK union all select a from IN ;

from pyspark.sql import SparkSession,SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.storagelevel import StorageLevel
spark = SparkSession.builder.appName('ABC').config('spark.ui.port','3124').master("yarn").enableHiveSupport().getOrCreate()

spark.conf.set("spark.sql.ansi.enabled", "false")

df4 = spark.sql(' select * from test.T ')

Re: Pyspark cast error

anandreddy23 — Wed, 05 Feb 2025 12:30:33 GMT

Also, Ideally Hive is my database where the view is getting created. It does create view without any cast errors . It is spark that looks for precision to be same in the view definition.

Ideally spark is a framework which should not have any role (??) on how users have created view and database that where it is created is also fine ?