<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pyspark cast error in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/40076#M9750</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, Many thanks for your quick response.&lt;/P&gt;&lt;P&gt;I am sorry as I can't change the datatype or match the decimals. However, my doubt is again Hive is the DB where the view is created and queries without any errors, why should spark have to look into scale &amp;amp; precision if datatypes match ? We were told spark is framework that speeds up reading and processing data using multiple nodes within cluster, but wasn't aware that it would use its own SQL execution plan and its rules are different from underlying database&lt;/P&gt;</description>
    <pubDate>Wed, 16 Aug 2023 15:03:11 GMT</pubDate>
    <dc:creator>anandreddy23</dc:creator>
    <dc:date>2023-08-16T15:03:11Z</dc:date>
    <item>
      <title>Pyspark cast error</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/40050#M9748</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;hive&amp;gt; create table UK ( a decimal(10,2)) ;&lt;/P&gt;&lt;P&gt;hive&amp;gt; create table IN ( a decimal(10,5)) ;&lt;/P&gt;&lt;P&gt;hive&amp;gt; create view T as select a from UK union all select a from IN ;&lt;/P&gt;&lt;P&gt;above all statements executes successfully in Hive and return results when select statement is executed. However, when select statement executed from python using pyspark I get error saying "Cannot up cast a from decimal(10,2) to decimal(10,5)".&lt;/P&gt;&lt;P&gt;Ideally view looks for same datatype and also this work fine in its source data db(hive). This has become a show stopper and Cculd you please help me with a possible&amp;nbsp; solution to fix this in pyspark please ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Anand.&lt;/P&gt;&lt;P&gt;#pyspark&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2023 13:12:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/40050#M9748</guid>
      <dc:creator>anandreddy23</dc:creator>
      <dc:date>2023-08-16T13:12:04Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark cast error</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/40076#M9750</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, Many thanks for your quick response.&lt;/P&gt;&lt;P&gt;I am sorry as I can't change the datatype or match the decimals. However, my doubt is again Hive is the DB where the view is created and queries without any errors, why should spark have to look into scale &amp;amp; precision if datatypes match ? We were told spark is framework that speeds up reading and processing data using multiple nodes within cluster, but wasn't aware that it would use its own SQL execution plan and its rules are different from underlying database&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2023 15:03:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/40076#M9750</guid>
      <dc:creator>anandreddy23</dc:creator>
      <dc:date>2023-08-16T15:03:11Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark cast error</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/108269#M9751</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;Spark SQL enforces stricter type casting rules compared to Hive, which is why you are encountering the "Cannot up cast a from decimal(10,2) to decimal(10,5)" error in PySpark. While Hive allows combining columns with different decimal scales in a union operation without issue, Spark SQL requires the scales to match exactly.&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;To resolve this issue in PySpark, you can explicitly cast the columns to a common decimal type before performing the union operation.&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;However you can try to temporarily disabling ANSI mode can help bypass strict type casting rules.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;spark.conf.set("spark.sql.ansi.enabled", "false")
&lt;/LI-CODE&gt;</description>
      <pubDate>Sat, 01 Feb 2025 07:14:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/108269#M9751</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-02-01T07:14:57Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark cast error</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/108940#M9752</link>
      <description>&lt;P&gt;Hi Nandini,&lt;/P&gt;&lt;P&gt;Thanks for sharing the above solution. To be sure my understanding is correct, could you confirm below please ?&lt;/P&gt;&lt;P&gt;hive&amp;gt; create table test.UK ( a decimal(10,2)) ;&lt;/P&gt;&lt;P&gt;hive&amp;gt; create table test.IN ( a decimal(10,5)) ;&lt;/P&gt;&lt;P&gt;hive&amp;gt; create view test.T as select a from UK union all select a from IN ;&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession,SQLContext&lt;BR /&gt;from pyspark import SparkContext, SparkConf&lt;BR /&gt;from pyspark.storagelevel import StorageLevel&lt;BR /&gt;spark = SparkSession.builder.appName('ABC').config('spark.ui.port','3124').master("yarn").enableHiveSupport().getOrCreate()&amp;nbsp;&lt;/P&gt;&lt;P&gt;spark.conf.set("spark.sql.ansi.enabled", "false")&lt;/P&gt;&lt;P&gt;df4 = spark.sql(' select * from test.T ')&lt;/P&gt;</description>
      <pubDate>Wed, 05 Feb 2025 12:21:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/108940#M9752</guid>
      <dc:creator>anandreddy23</dc:creator>
      <dc:date>2025-02-05T12:21:20Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark cast error</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/108941#M9753</link>
      <description>&lt;P&gt;Also, Ideally Hive is my database where the view is getting created. It does create view without any cast errors . It is spark that looks for precision to be same in the view definition.&lt;/P&gt;&lt;P&gt;Ideally spark is a framework which should not have any role&amp;nbsp;(??) on how users&amp;nbsp; have created view and database that where it is created is also fine ?&lt;/P&gt;</description>
      <pubDate>Wed, 05 Feb 2025 12:30:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pyspark-cast-error/m-p/108941#M9753</guid>
      <dc:creator>anandreddy23</dc:creator>
      <dc:date>2025-02-05T12:30:33Z</dc:date>
    </item>
  </channel>
</rss>

