cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Dicer
Valued Contributor

Data type:

AAPL_Time: timestamp

AAPL_Close: float

Raw Data:

AAPL_Time                                          AAPL_Close
2015-05-11T08:00:00.000+0000   29.0344
2015-05-11T08:30:00.000+0000   29.0187
2015-05-11T09:00:00.000+0000   29.0346
2015-05-11T09:30:00.000+0000   28.763
2015-05-11T10:00:00.000+0000   28.6768
2015-05-11T10:30:00.000+0000   28.7464
2015-05-12T12:30:00.000+0000   28.7915
2015-05-12T13:00:00.000+0000   28.8763
2015-05-12T13:30:00.000+0000   28.8316
2015-05-12T14:00:00.000+0000   28.8418
2015-05-12T14:30:00.000+0000   28.7703

I am trying to extract data by 

spark.sql("SELECT AAPL_Time, AAPL_Close FROM aapl_table where AAPL_Time between '%09:30:00%' and '%16:30:00%'")

I expect the result should be this:

AAPL_Time                                          AAPL_Close
 
2015-05-11T09:30:00.000+0000   28.763
 
2015-05-11T10:00:00.000+0000   28.6768
 
2015-05-11T10:30:00.000+0000   28.7464
 
2015-05-12T12:30:00.000+0000   28.7915
 
2015-05-12T13:00:00.000+0000   28.8763
 
2015-05-12T13:30:00.000+0000   28.8316
 
2015-05-12T14:00:00.000+0000   28.8418
 
2015-05-12T14:30:00.000+0000   28.7703

Yet, the final result is this:

Query returned no results

Can anyone help me?

Same question in https://stackoverflow.com/questions/73326977/azure-databricks-failed-to-extract-data-which-is-betwee...

1 ACCEPTED SOLUTION

Accepted Solutions

Pholo
Contributor

Hi, maybe you can use the date_format function and retrieve hour and minutes as string

import  pyspark.sql.functions as F
aapl_table.filter(
    F.date_format(F.col('AAPL_Time '),'HHmm').between("0930","1630")
  )

This is the version in pyspark, if you want to do in spark sql you can try

display(spark.sql("SELECT Aapl_Time, AApl_Close FROM aapl_table where DATE_FORMAT(Aapl_Time,'HHmm') between '0930' and '1630'"))

Let me know if it helped.

View solution in original post

4 REPLIES 4

Pholo
Contributor

Hi, maybe you can use the date_format function and retrieve hour and minutes as string

import  pyspark.sql.functions as F
aapl_table.filter(
    F.date_format(F.col('AAPL_Time '),'HHmm').between("0930","1630")
  )

This is the version in pyspark, if you want to do in spark sql you can try

display(spark.sql("SELECT Aapl_Time, AApl_Close FROM aapl_table where DATE_FORMAT(Aapl_Time,'HHmm') between '0930' and '1630'"))

Let me know if it helped.

Dicer
Valued Contributor

It works. Thank you!

I'm glad that It helped 😃

Anonymous
Not applicable

Another thing to try is the hour() and minute() functions will return integers.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!