cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pyspark cast error

anandreddy23
New Contributor III

Hi All,

hive> create table UK ( a decimal(10,2)) ;

hive> create table IN ( a decimal(10,5)) ;

hive> create view T as select a from UK union all select a from IN ;

above all statements executes successfully in Hive and return results when select statement is executed. However, when select statement executed from python using pyspark I get error saying "Cannot up cast a from decimal(10,2) to decimal(10,5)".

Ideally view looks for same datatype and also this work fine in its source data db(hive). This has become a show stopper and Cculd you please help me with a possible  solution to fix this in pyspark please ? 

Thanks in advance,

Anand.

#pyspark

4 REPLIES 4

Hi @Retired_mod , Many thanks for your quick response.

I am sorry as I can't change the datatype or match the decimals. However, my doubt is again Hive is the DB where the view is created and queries without any errors, why should spark have to look into scale & precision if datatypes match ? We were told spark is framework that speeds up reading and processing data using multiple nodes within cluster, but wasn't aware that it would use its own SQL execution plan and its rules are different from underlying database

NandiniN
Databricks Employee
Databricks Employee

Spark SQL enforces stricter type casting rules compared to Hive, which is why you are encountering the "Cannot up cast a from decimal(10,2) to decimal(10,5)" error in PySpark. While Hive allows combining columns with different decimal scales in a union operation without issue, Spark SQL requires the scales to match exactly.

To resolve this issue in PySpark, you can explicitly cast the columns to a common decimal type before performing the union operation.

However you can try to temporarily disabling ANSI mode can help bypass strict type casting rules.

spark.conf.set("spark.sql.ansi.enabled", "false")

anandreddy23
New Contributor III

Hi Nandini,

Thanks for sharing the above solution. To be sure my understanding is correct, could you confirm below please ?

hive> create table test.UK ( a decimal(10,2)) ;

hive> create table test.IN ( a decimal(10,5)) ;

hive> create view test.T as select a from UK union all select a from IN ;

from pyspark.sql import SparkSession,SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.storagelevel import StorageLevel
spark = SparkSession.builder.appName('ABC').config('spark.ui.port','3124').master("yarn").enableHiveSupport().getOrCreate() 

spark.conf.set("spark.sql.ansi.enabled", "false")

df4 = spark.sql(' select * from test.T ')

Also, Ideally Hive is my database where the view is getting created. It does create view without any cast errors . It is spark that looks for precision to be same in the view definition.

Ideally spark is a framework which should not have any role (??) on how users  have created view and database that where it is created is also fine ?