The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differently
In Source file date is 1/24/2022.
In dataframe it is 1/24/22
Code used:
from pyspark.sql.functions import *
import pyspark.sql.functions as sf
import pyspark.sql.types
import pandas as pd
import os
import glob
filenames = glob.glob(PathSource + "/*.xls")
dfs = []
for df in dfs:
xl_file = pd.ExcelFile(filenames)
df=xl_file.parse('Sheet1')
dfs.concat(df, ignore_index=True)
display(df)
Thanks in Advance for any help or guidance.