Databricks Community

sreedata · ‎03-31-2022

The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differently

In Source file date is 1/24/2022.

In dataframe it is 1/24/22

Code used:

from pyspark.sql.functions import *

import pyspark.sql.functions as sf

import pyspark.sql.types

import pandas as pd

import os

import glob

filenames = glob.glob(PathSource + "/*.xls")

dfs = []

for df in dfs:

xl_file = pd.ExcelFile(filenames)

df=xl_file.parse('Sheet1')

dfs.concat(df, ignore_index=True)

display(df)

Thanks in Advance for any help or guidance.

merca · ‎04-02-2022

@srikanth nair , Have you checked the output in pandas and eventually pass the parse_dates=False to ignore dates. Pandas uses dateutil.parser.parser as default

View solution in original post

merca · ‎04-02-2022

@srikanth nair , Have you checked the output in pandas and eventually pass the parse_dates=False to ignore dates. Pandas uses dateutil.parser.parser as default

Anonymous · ‎05-18-2022

Hi @sreedata (Customer) , Just a friendly follow-up. Do you still need help, or @merca (Customer) 's response help you to find the solution? Please let us know.

sreedata · ‎05-19-2022

working fine now thanks

Pradeep_Namani · ‎11-17-2022

Hi Team, @Merca Ovnerud

I am also facing same issue , below is the code snippet which I am using

df=spark.read.format("com.crealytics.spark.excel").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")

I have a couple of date columns , all are showing dd/mm/yy format but it has to come as dd/mm/yyyy format

source file has: 26-03-1950

Dataframe has : 26-03-50

I have used parse_dates=False but it is not working, Can any one help on this

Databricks Community

Date field getting changed when reading from excel file to dataframe

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon