Date field getting changed when reading from excel file to dataframe

sreedata
New Contributor III

The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differently

In Source file date is 1/24/2022.

In dataframe it is 1/24/22

Code used:

from pyspark.sql.functions import *

import pyspark.sql.functions as sf

import pyspark.sql.types

import pandas as pd

import os

import glob

filenames = glob.glob(PathSource + "/*.xls")

dfs = []

for df in dfs: 

  xl_file = pd.ExcelFile(filenames)

  df=xl_file.parse('Sheet1')

  dfs.concat(df, ignore_index=True)

   

display(df)

Thanks in Advance for any help or guidance.