cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Determine what is the interval in a timestamp column

dzlab
New Contributor

OK so I'm trying to determine if a timestamp column has a regular interval or not, i.e. the difference between each consecutive value is the same across the entire column.

I tried something like this

val timeColumn: String =
 
val groupByColumn: String =
 
val window = Window.orderBy(col(timeColumn).asc).partitionBy(groupByColumn)
val intervalsDF = inputDF.select(col(timeColumn) - lag(col(timeColumn), 1).over(window))
val intervals = intervalsDF.distinct().collect().filterNot(_.isNullAt(0))
 
val isRegular = intervals.length == 1

This works only if the timestamps are in the hours, minutes, or seconds. But won't work for instance, if I have monthly timestamps like this

|time|c1|c2|c3| ...|

|-|-|-|-|-|

|2020-01-01 00:00:00|||| ...|

|2020-02-01 00:00:00|||| ...|

|2020-03-01 00:00:00|||| ...|

|2020-04-01 00:00:00|||| ...|

The logic above won't work because the number of days in January is different from the one in February, so it will think the timestamps are not of the same interval.

What other approach should I try to check if the timestamps follow a regular interval or not?

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.