cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

i am trying to find different between two dates but i am getting null value in new column below are the dates in same format tryied to change the format but still it is not working is databricks

ahana
New Contributor III

this are the dates in sql image

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

you have to use the exact format M/d/yyyy:

image

View solution in original post

11 REPLIES 11

-werners-
Esteemed Contributor III

The trick is to make sure your columns are recognized as dates, otherwise they will be evaluated as strings.

"to_date(<datecol>, <date_format>) " does exactly this.

datecol is your column containing the date values, and date_format is the format in which your existing date is formatted. In your case "MM/dd/yyyy".

This will return values of spark dateType.

When they are recognized as dates, you can start calculating with them.

ahana
New Contributor III

i tried all way but it is still showing null

-werners-
Esteemed Contributor III

I notice you do not have leading zeroes in your data format,

try M/d/yyyy instead.

https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Maybe you have leading spaces too, so trim() might also help.

For sure this is not a bug because the date functions in Spark are rock solid.

ahana
New Contributor III

imageimagethe problem i am facing

-werners-
Esteemed Contributor III

you have to use the exact format M/d/yyyy:

image

ahana
New Contributor III

thank u thank u soooo much it works thank u😊

Hubert-Dudek
Esteemed Contributor III

Exactly as @Werner Stinckens​ said. Additionally you can share your file and script so we can help better.

Your screenshot looks like excel. If it is excel format please check is all fields a data format (you can change also to number as every date is number of days from 31st December 1899). If it is csv format as werners said you need to specify format because for example 6/5/2021 can crush as it can be 5th June or 6th May.

Sandeep
Contributor III

@ahana ahana​  try this out:

%python
 
from pyspark.sql import functions as F
res = df.withColumn("start_date", F.to_date("start", "mm/dd/yyyy")).withColumn("end_date", F.to_date("end", "mm/dd/yyyy")).withColumn("date_diff", F.datediff("end_date","start_date"))

It works:

image

ahana
New Contributor III

no its nt working i tried below function

datediff(concat(

split(start_date, '/') [2],

'-',case

when split(start_date, '/') [0] < 10 then concat('0', split(start_date, '/') [0])

else split(start_date, '/') [0]

end,

'-',

case

when split(start_date, '/') [1] < 10 then concat('0', split(start_date, '/') [1])

else split(start_date, '/') [1]

end), concat(split(end_date, '/') [2],

'-',case

when split(end_date, '/') [0] < 10 then concat('0', split(end_date, '/') [0])

else split(end, '/') [0]

end,

'-',

case

when split(end_date, '/') [1] < 10 then concat('0', split(end_date, '/') [1])

else split(end_date, '/') [1]

end

) )as diff

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @ahana ahana​ ,

Did any of the replies helped you solve this issue? would you be happy to mark their answer as best so that others can quickly find the solution?

Thank you

ahana
New Contributor III

no i am not satisfied with the given answer

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group