Databricks Community

anandreddy23 · ‎08-16-2023

Hi All,

hive> create table UK ( a decimal(10,2)) ;

hive> create table IN ( a decimal(10,5)) ;

hive> create view T as select a from UK union all select a from IN ;

above all statements executes successfully in Hive and return results when select statement is executed. However, when select statement executed from python using pyspark I get error saying "Cannot up cast a from decimal(10,2) to decimal(10,5)".

Ideally view looks for same datatype and also this work fine in its source data db(hive). This has become a show stopper and Cculd you please help me with a possible solution to fix this in pyspark please ?

Thanks in advance,

Anand.

#pyspark

anandreddy23 · ‎08-16-2023

Hi @Retired_mod , Many thanks for your quick response.

I am sorry as I can't change the datatype or match the decimals. However, my doubt is again Hive is the DB where the view is created and queries without any errors, why should spark have to look into scale & precision if datatypes match ? We were told spark is framework that speeds up reading and processing data using multiple nodes within cluster, but wasn't aware that it would use its own SQL execution plan and its rules are different from underlying database

NandiniN · ‎01-31-2025

Spark SQL enforces stricter type casting rules compared to Hive, which is why you are encountering the "Cannot up cast a from decimal(10,2) to decimal(10,5)" error in PySpark. While Hive allows combining columns with different decimal scales in a union operation without issue, Spark SQL requires the scales to match exactly.

To resolve this issue in PySpark, you can explicitly cast the columns to a common decimal type before performing the union operation.

However you can try to temporarily disabling ANSI mode can help bypass strict type casting rules.

spark.conf.set("spark.sql.ansi.enabled", "false")

anandreddy23 · ‎02-05-2025

Hi Nandini,

Thanks for sharing the above solution. To be sure my understanding is correct, could you confirm below please ?

hive> create table test.UK ( a decimal(10,2)) ;

hive> create table test.IN ( a decimal(10,5)) ;

hive> create view test.T as select a from UK union all select a from IN ;

from pyspark.sql import SparkSession,SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.storagelevel import StorageLevel
spark = SparkSession.builder.appName('ABC').config('spark.ui.port','3124').master("yarn").enableHiveSupport().getOrCreate()

spark.conf.set("spark.sql.ansi.enabled", "false")

df4 = spark.sql(' select * from test.T ')

anandreddy23 · ‎02-05-2025

Also, Ideally Hive is my database where the view is getting created. It does create view without any cast errors . It is spark that looks for precision to be same in the view definition.

Ideally spark is a framework which should not have any role (??) on how users have created view and database that where it is created is also fine ?

Databricks Community

Pyspark cast error

Join Us as a Local Community Builder!

What's new in Databricks: July - August 2025

How to use Lakebase as a transactional data layer for Databricks Apps

🌟 Community Spark of the Week | Aug 22 – Aug 28 🌟

EMEA Learning Festival: Hands-on Learning Journey!

Virtual Learning Festival: 10 October - 31 October 2025