cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there a way in Azure to compare data in one field?

CBull
New Contributor III

Is there a way to compare a time stamp within on field/column for an individual ID? For example, if I have two records for an ID and the time stamps are within 5 min of each other....I just want to keep the latest. But, for example, if they were an hour apart I would keep both records.

3 REPLIES 3

merca
Valued Contributor II

Windowing function can be what you need.

from pyspark.sql import functions as F
df.groupBy(F.window("event_time","5 minutes"))

CBull
New Contributor III

So, is this done something like this?

SELECT  

    r.patientmedicalrecordnumber,

    r.callreceiveddatetime as date

    

FROM  table r

    LEFT OUTER JOIN table p

          ON r.pageid = p.pageid

           

WHERE p.pagetype = 6

   and cast(r.callreceiveddatetime as date) = current_date() - 1

df.groupBy (r.window("event_time","5 minutes"))

ORDER BY r.callreceiveddatetime

merca
Valued Contributor II

Since you are trying to do this in SQL, I hope someone else can write you the correct answer. The above example is for pyspark. You can check the SQL synax from Databricks documents

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group