<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Date field getting changed when reading from excel file to dataframe in pyspark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20785#M14073</link>
    <description>&lt;P&gt;The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differently&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In Source file date is 1/24/1947.&lt;/P&gt;&lt;P&gt;In pyspark dataframe it is 1/24/47&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Code used:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df=spark.read.format("com.crealytics.spark.excel").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I use option("inforscheme","true") the data coming properly , but I dont want use inforschema, Can any one suggest me any solution.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;</description>
    <pubDate>Thu, 24 Nov 2022 06:40:57 GMT</pubDate>
    <dc:creator>Pradeep_Namani</dc:creator>
    <dc:date>2022-11-24T06:40:57Z</dc:date>
    <item>
      <title>Date field getting changed when reading from excel file to dataframe in pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20785#M14073</link>
      <description>&lt;P&gt;The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differently&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In Source file date is 1/24/1947.&lt;/P&gt;&lt;P&gt;In pyspark dataframe it is 1/24/47&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Code used:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df=spark.read.format("com.crealytics.spark.excel").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I use option("inforscheme","true") the data coming properly , but I dont want use inforschema, Can any one suggest me any solution.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2022 06:40:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20785#M14073</guid>
      <dc:creator>Pradeep_Namani</dc:creator>
      <dc:date>2022-11-24T06:40:57Z</dc:date>
    </item>
    <item>
      <title>Re: Date field getting changed when reading from excel file to dataframe in pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20786#M14074</link>
      <description>&lt;P&gt;hi &amp;nbsp;@Pradeep Namani​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;could you plz try to run below one. I hope so it will work without inferschema&lt;/P&gt;&lt;P&gt;df=spark.read.format("csv").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2022 06:52:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20786#M14074</guid>
      <dc:creator>yogu</dc:creator>
      <dc:date>2022-11-24T06:52:24Z</dc:date>
    </item>
    <item>
      <title>Re: Date field getting changed when reading from excel file to dataframe in pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20787#M14075</link>
      <description>&lt;P&gt;Thank you @Yogita Chavan​&amp;nbsp; for replying , but when I am reading file as csv it is showing all data in different format, I am attaching the screen shot&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="ADB issue"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1117iF6766EE6BA75F9FB/image-size/large?v=v2&amp;amp;px=999" role="button" title="ADB issue" alt="ADB issue" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2022 07:12:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20787#M14075</guid>
      <dc:creator>Pradeep_Namani</dc:creator>
      <dc:date>2022-11-24T07:12:54Z</dc:date>
    </item>
    <item>
      <title>Re: Date field getting changed when reading from excel file to dataframe in pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20788#M14076</link>
      <description>&lt;P&gt;also u can refer below one &lt;/P&gt;&lt;P&gt;&lt;A href="https://mayur-saparia7.medium.com/reading-excel-file-in-pyspark-databricks-notebook-c75a63181548" target="test_blank"&gt;https://mayur-saparia7.medium.com/reading-excel-file-in-pyspark-databricks-notebook-c75a63181548&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2022 07:13:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20788#M14076</guid>
      <dc:creator>yogu</dc:creator>
      <dc:date>2022-11-24T07:13:17Z</dc:date>
    </item>
    <item>
      <title>Re: Date field getting changed when reading from excel file to dataframe in pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20789#M14077</link>
      <description>&lt;P&gt;how about using inferschema one single time to create a correct DF, then create a schema from the df-schema.&lt;/P&gt;&lt;P&gt;something like this f.e.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.types import StructType    
&amp;nbsp;
# Save schema from the original DataFrame into json:
schema_json = df.schema.json()
&amp;nbsp;
# Restore schema from json:
import json
new_schema = StructType.fromJson(json.loads(schema_json))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2022 10:37:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20789#M14077</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-11-24T10:37:36Z</dc:date>
    </item>
    <item>
      <title>Re: Date field getting changed when reading from excel file to dataframe in pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20790#M14078</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="ADB issue1"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1137i54C1106FB06D306C/image-size/large?v=v2&amp;amp;px=999" role="button" title="ADB issue1" alt="ADB issue1" /&gt;&lt;/span&gt;I have tried the option which we have give in above url but no use, still I am facing same issue&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2022 11:08:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/date-field-getting-changed-when-reading-from-excel-file-to/m-p/20790#M14078</guid>
      <dc:creator>Pradeep_Namani</dc:creator>
      <dc:date>2022-11-24T11:08:37Z</dc:date>
    </item>
  </channel>
</rss>

