11-09-2021 10:32 PM
11-15-2021 01:03 AM
11-10-2021 01:23 AM
The trick is to make sure your columns are recognized as dates, otherwise they will be evaluated as strings.
"to_date(<datecol>, <date_format>) " does exactly this.
datecol is your column containing the date values, and date_format is the format in which your existing date is formatted. In your case "MM/dd/yyyy".
This will return values of spark dateType.
When they are recognized as dates, you can start calculating with them.
11-14-2021 11:11 PM
i tried all way but it is still showing null
11-14-2021 11:19 PM
I notice you do not have leading zeroes in your data format,
try M/d/yyyy instead.
https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Maybe you have leading spaces too, so trim() might also help.
For sure this is not a bug because the date functions in Spark are rock solid.
11-14-2021 11:54 PM
11-15-2021 01:03 AM
11-15-2021 03:19 AM
thank u thank u soooo much it works thank u😊
11-10-2021 03:17 AM
Exactly as @Werner Stinckens said. Additionally you can share your file and script so we can help better.
Your screenshot looks like excel. If it is excel format please check is all fields a data format (you can change also to number as every date is number of days from 31st December 1899). If it is csv format as werners said you need to specify format because for example 6/5/2021 can crush as it can be 5th June or 6th May.
11-12-2021 05:39 AM
@ahana ahana try this out:
%python
from pyspark.sql import functions as F
res = df.withColumn("start_date", F.to_date("start", "mm/dd/yyyy")).withColumn("end_date", F.to_date("end", "mm/dd/yyyy")).withColumn("date_diff", F.datediff("end_date","start_date"))
It works:
11-15-2021 12:17 AM
no its nt working i tried below function
datediff(concat(
split(start_date, '/') [2],
'-',case
when split(start_date, '/') [0] < 10 then concat('0', split(start_date, '/') [0])
else split(start_date, '/') [0]
end,
'-',
case
when split(start_date, '/') [1] < 10 then concat('0', split(start_date, '/') [1])
else split(start_date, '/') [1]
end), concat(split(end_date, '/') [2],
'-',case
when split(end_date, '/') [0] < 10 then concat('0', split(end_date, '/') [0])
else split(end, '/') [0]
end,
'-',
case
when split(end_date, '/') [1] < 10 then concat('0', split(end_date, '/') [1])
else split(end_date, '/') [1]
end
) )as diff
11-12-2021 03:49 PM
Hi @ahana ahana ,
Did any of the replies helped you solve this issue? would you be happy to mark their answer as best so that others can quickly find the solution?
Thank you
11-15-2021 12:19 AM
no i am not satisfied with the given answer
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group