cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Dicer
Valued Contributor

Data type:

AAPL_Time: timestamp

AAPL_Close: float

Raw Data:

AAPL_Time                                          AAPL_Close
2015-05-11T08:00:00.000+0000   29.0344
2015-05-11T08:30:00.000+0000   29.0187
2015-05-11T09:00:00.000+0000   29.0346
2015-05-11T09:30:00.000+0000   28.763
2015-05-11T10:00:00.000+0000   28.6768
2015-05-11T10:30:00.000+0000   28.7464
2015-05-12T12:30:00.000+0000   28.7915
2015-05-12T13:00:00.000+0000   28.8763
2015-05-12T13:30:00.000+0000   28.8316
2015-05-12T14:00:00.000+0000   28.8418
2015-05-12T14:30:00.000+0000   28.7703

I am trying to extract data by 

spark.sql("SELECT AAPL_Time, AAPL_Close FROM aapl_table where AAPL_Time between '%09:30:00%' and '%16:30:00%'")

I expect the result should be this:

AAPL_Time                                          AAPL_Close
 
2015-05-11T09:30:00.000+0000   28.763
 
2015-05-11T10:00:00.000+0000   28.6768
 
2015-05-11T10:30:00.000+0000   28.7464
 
2015-05-12T12:30:00.000+0000   28.7915
 
2015-05-12T13:00:00.000+0000   28.8763
 
2015-05-12T13:30:00.000+0000   28.8316
 
2015-05-12T14:00:00.000+0000   28.8418
 
2015-05-12T14:30:00.000+0000   28.7703

Yet, the final result is this:

Query returned no results

Can anyone help me?

Same question in https://stackoverflow.com/questions/73326977/azure-databricks-failed-to-extract-data-which-is-betwee...

1 ACCEPTED SOLUTION

Accepted Solutions

Pholo
Contributor

Hi, maybe you can use the date_format function and retrieve hour and minutes as string

import  pyspark.sql.functions as F
aapl_table.filter(
    F.date_format(F.col('AAPL_Time '),'HHmm').between("0930","1630")
  )

This is the version in pyspark, if you want to do in spark sql you can try

display(spark.sql("SELECT Aapl_Time, AApl_Close FROM aapl_table where DATE_FORMAT(Aapl_Time,'HHmm') between '0930' and '1630'"))

Let me know if it helped.

View solution in original post

4 REPLIES 4

Pholo
Contributor

Hi, maybe you can use the date_format function and retrieve hour and minutes as string

import  pyspark.sql.functions as F
aapl_table.filter(
    F.date_format(F.col('AAPL_Time '),'HHmm').between("0930","1630")
  )

This is the version in pyspark, if you want to do in spark sql you can try

display(spark.sql("SELECT Aapl_Time, AApl_Close FROM aapl_table where DATE_FORMAT(Aapl_Time,'HHmm') between '0930' and '1630'"))

Let me know if it helped.

Dicer
Valued Contributor

It works. Thank you!

I'm glad that It helped 😃

Anonymous
Not applicable

Another thing to try is the hour() and minute() functions will return integers.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.