cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Determine what is the interval in a timestamp column

dzlab
New Contributor

OK so I'm trying to determine if a timestamp column has a regular interval or not, i.e. the difference between each consecutive value is the same across the entire column.

I tried something like this

val timeColumn: String =
 
val groupByColumn: String =
 
val window = Window.orderBy(col(timeColumn).asc).partitionBy(groupByColumn)
val intervalsDF = inputDF.select(col(timeColumn) - lag(col(timeColumn), 1).over(window))
val intervals = intervalsDF.distinct().collect().filterNot(_.isNullAt(0))
 
val isRegular = intervals.length == 1

This works only if the timestamps are in the hours, minutes, or seconds. But won't work for instance, if I have monthly timestamps like this

|time|c1|c2|c3| ...|

|-|-|-|-|-|

|2020-01-01 00:00:00|||| ...|

|2020-02-01 00:00:00|||| ...|

|2020-03-01 00:00:00|||| ...|

|2020-04-01 00:00:00|||| ...|

The logic above won't work because the number of days in January is different from the one in February, so it will think the timestamps are not of the same interval.

What other approach should I try to check if the timestamps follow a regular interval or not?

0 REPLIES 0