cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Date field getting changed when reading from excel file to dataframe in pyspark

Pradeep_Namani
New Contributor III

The date field is getting changed while reading data from source .xls file to the dataframe. In the source xl file all columns are strings but i am not sure why date column alone behaves differently

In Source file date is 1/24/1947.

In pyspark dataframe it is 1/24/47

Code used:

df=spark.read.format("com.crealytics.spark.excel").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")

If I use option("inforscheme","true") the data coming properly , but I dont want use inforschema, Can any one suggest me any solution.

Thanks in advance

5 REPLIES 5

yogu
Honored Contributor III

hi  @Pradeep Namaniโ€‹ ,

could you plz try to run below one. I hope so it will work without inferschema

df=spark.read.format("csv").option("header","true").load("/mnt/dataplatform/Tenant_PK/Results.xlsx")

Pradeep_Namani
New Contributor III

Thank you @Yogita Chavanโ€‹  for replying , but when I am reading file as csv it is showing all data in different format, I am attaching the screen shot

ADB issue

yogu
Honored Contributor III

Pradeep_Namani
New Contributor III

ADB issue1I have tried the option which we have give in above url but no use, still I am facing same issue

-werners-
Esteemed Contributor III

how about using inferschema one single time to create a correct DF, then create a schema from the df-schema.

something like this f.e.

from pyspark.sql.types import StructType    
 
# Save schema from the original DataFrame into json:
schema_json = df.schema.json()
 
# Restore schema from json:
import json
new_schema = StructType.fromJson(json.loads(schema_json))

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group