08-16-2023 06:12 AM
Hi All,
hive> create table UK ( a decimal(10,2)) ;
hive> create table IN ( a decimal(10,5)) ;
hive> create view T as select a from UK union all select a from IN ;
above all statements executes successfully in Hive and return results when select statement is executed. However, when select statement executed from python using pyspark I get error saying "Cannot up cast a from decimal(10,2) to decimal(10,5)".
Ideally view looks for same datatype and also this work fine in its source data db(hive). This has become a show stopper and Cculd you please help me with a possible solution to fix this in pyspark please ?
Thanks in advance,
Anand.
#pyspark
08-16-2023 08:03 AM
Hi @Retired_mod , Many thanks for your quick response.
I am sorry as I can't change the datatype or match the decimals. However, my doubt is again Hive is the DB where the view is created and queries without any errors, why should spark have to look into scale & precision if datatypes match ? We were told spark is framework that speeds up reading and processing data using multiple nodes within cluster, but wasn't aware that it would use its own SQL execution plan and its rules are different from underlying database
01-31-2025 11:14 PM
Spark SQL enforces stricter type casting rules compared to Hive, which is why you are encountering the "Cannot up cast a from decimal(10,2) to decimal(10,5)" error in PySpark. While Hive allows combining columns with different decimal scales in a union operation without issue, Spark SQL requires the scales to match exactly.
To resolve this issue in PySpark, you can explicitly cast the columns to a common decimal type before performing the union operation.
However you can try to temporarily disabling ANSI mode can help bypass strict type casting rules.
spark.conf.set("spark.sql.ansi.enabled", "false")
02-05-2025 04:21 AM
Hi Nandini,
Thanks for sharing the above solution. To be sure my understanding is correct, could you confirm below please ?
hive> create table test.UK ( a decimal(10,2)) ;
hive> create table test.IN ( a decimal(10,5)) ;
hive> create view test.T as select a from UK union all select a from IN ;
from pyspark.sql import SparkSession,SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.storagelevel import StorageLevel
spark = SparkSession.builder.appName('ABC').config('spark.ui.port','3124').master("yarn").enableHiveSupport().getOrCreate()
spark.conf.set("spark.sql.ansi.enabled", "false")
df4 = spark.sql(' select * from test.T ')
02-05-2025 04:30 AM
Also, Ideally Hive is my database where the view is getting created. It does create view without any cast errors . It is spark that looks for precision to be same in the view definition.
Ideally spark is a framework which should not have any role (??) on how users have created view and database that where it is created is also fine ?
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now