โ03-31-2022 06:47 AM
The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differently
In Source file date is 1/24/2022.
In dataframe it is 1/24/22
Code used:
from pyspark.sql.functions import *
import pyspark.sql.functions as sf
import pyspark.sql.types
import pandas as pd
import os
import glob
filenames = glob.glob(PathSource + "/*.xls")
dfs = []
for df in dfs:
xl_file = pd.ExcelFile(filenames)
df=xl_file.parse('Sheet1')
dfs.concat(df, ignore_index=True)
display(df)
Thanks in Advance for any help or guidance.
โ04-02-2022 10:31 AM
@srikanth nairโ , Have you checked the output in pandas and eventually pass the parse_dates=False to ignore dates. Pandas uses dateutil.parser.parser as default
โ04-02-2022 10:31 AM
@srikanth nairโ , Have you checked the output in pandas and eventually pass the parse_dates=False to ignore dates. Pandas uses dateutil.parser.parser as default
โ05-18-2022 11:16 PM
โ05-19-2022 06:45 AM
working fine now thanks
โ11-17-2022 06:56 AM
Hi Team, @Merca Ovnerudโ
I am also facing same issue , below is the code snippet which I am using
df=spark.read.format("com.crealytics.spark.excel").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")
I have a couple of date columns , all are showing dd/mm/yy format but it has to come as dd/mm/yyyy format
source file has: 26-03-1950
Dataframe has : 26-03-50
I have used parse_dates=False but it is not working, Can any one help on this
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group