<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64 in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2847#M119</link>
    <description>&lt;P&gt;Hi @THIAM HUAT TAN​&amp;nbsp;, The issue is because the schema defined for the column "Rainfall_Value" is of DoubleType and the values present in the data frame are of Integer type.  This could be because of one or multiple values. Depending on the data, you need to update either of the one i.e. schema or the data.&lt;/P&gt;</description>
    <pubDate>Tue, 20 Jun 2023 12:59:55 GMT</pubDate>
    <dc:creator>Lakshay</dc:creator>
    <dc:date>2023-06-20T12:59:55Z</dc:date>
    <item>
      <title>Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64</title>
      <link>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2845#M117</link>
      <description>&lt;P&gt;df.printSchema()&lt;/P&gt;&lt;P&gt;root&lt;/P&gt;&lt;P&gt; |-- Device_ID: string (nullable = true)&lt;/P&gt;&lt;P&gt; |-- Location: string (nullable = true)&lt;/P&gt;&lt;P&gt; |-- Latitude: double (nullable = true)&lt;/P&gt;&lt;P&gt; |-- Longitude: double (nullable = true)&lt;/P&gt;&lt;P&gt; |-- DateTime: timestamp (nullable = true)&lt;/P&gt;&lt;P&gt; |-- Rainfall_Value: double (nullable = true)&lt;/P&gt;&lt;P&gt; |-- year: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |-- month: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |-- day: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |-- hour: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |-- minute: integer (nullable = true)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df.write.partitionBy("year","month").mode("overwrite").parquet("/home/rainfall/parquet/rainfall.parquet")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 29 in stage 675.0 failed 1 times, most recent failure: Lost task 29.0 in stage 675.0 (TID 5311) (ip-10-175-235-230.ap-southeast-1.compute.internal executor driver): com.databricks.sql.io.FileReadException: Error while reading file dbfs:REDACTED_LOCAL_PART@xyz**.com.sg/weather123-lakehouse/delta/2022-10-13.parquet/part-00003-tid-7527434428502281281-b966b165-5e61-4ba0-a6ca-cea51e5acdf2-3762-1-c000.snappy.parquet. Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Since the above schema already shows Rainfall_Value to be of DoubleType, why does it complain it found INT64 type? I am lost how to debug for this.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jun 2023 13:41:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2845#M117</guid>
      <dc:creator>THIAM_HUATTAN</dc:creator>
      <dc:date>2023-06-19T13:41:58Z</dc:date>
    </item>
    <item>
      <title>Re: Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64</title>
      <link>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2846#M118</link>
      <description>&lt;P&gt;Hi @THIAM HUAT TAN​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Great to meet you, and thanks for your question! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's see if your peers in the community have an answer to your question. Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jun 2023 04:36:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2846#M118</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-20T04:36:57Z</dc:date>
    </item>
    <item>
      <title>Re: Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64</title>
      <link>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2847#M119</link>
      <description>&lt;P&gt;Hi @THIAM HUAT TAN​&amp;nbsp;, The issue is because the schema defined for the column "Rainfall_Value" is of DoubleType and the values present in the data frame are of Integer type.  This could be because of one or multiple values. Depending on the data, you need to update either of the one i.e. schema or the data.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jun 2023 12:59:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2847#M119</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-06-20T12:59:55Z</dc:date>
    </item>
    <item>
      <title>Re: Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64</title>
      <link>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2848#M120</link>
      <description>&lt;P&gt;Yes, they have answered, thanks for checking.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jun 2023 23:00:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parquet-column-cannot-be-converted-column-rainfall-value/m-p/2848#M120</guid>
      <dc:creator>THIAM_HUATTAN</dc:creator>
      <dc:date>2023-06-20T23:00:32Z</dc:date>
    </item>
  </channel>
</rss>

