cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Dicer
Valued Contributor

Data type:

AAPL_Time: timestamp

AAPL_Close: float

Raw Data:

AAPL_Time                                          AAPL_Close
2015-05-11T08:00:00.000+0000   29.0344
2015-05-11T08:30:00.000+0000   29.0187
2015-05-11T09:00:00.000+0000   29.0346
2015-05-11T09:30:00.000+0000   28.763
2015-05-11T10:00:00.000+0000   28.6768
2015-05-11T10:30:00.000+0000   28.7464
2015-05-12T12:30:00.000+0000   28.7915
2015-05-12T13:00:00.000+0000   28.8763
2015-05-12T13:30:00.000+0000   28.8316
2015-05-12T14:00:00.000+0000   28.8418
2015-05-12T14:30:00.000+0000   28.7703

I am trying to extract data by 

spark.sql("SELECT AAPL_Time, AAPL_Close FROM aapl_table where AAPL_Time between '%09:30:00%' and '%16:30:00%'")

I expect the result should be this:

AAPL_Time                                          AAPL_Close
 
2015-05-11T09:30:00.000+0000   28.763
 
2015-05-11T10:00:00.000+0000   28.6768
 
2015-05-11T10:30:00.000+0000   28.7464
 
2015-05-12T12:30:00.000+0000   28.7915
 
2015-05-12T13:00:00.000+0000   28.8763
 
2015-05-12T13:30:00.000+0000   28.8316
 
2015-05-12T14:00:00.000+0000   28.8418
 
2015-05-12T14:30:00.000+0000   28.7703

Yet, the final result is this:

Query returned no results

Can anyone help me?

Same question in https://stackoverflow.com/questions/73326977/azure-databricks-failed-to-extract-data-which-is-betwee...

1 ACCEPTED SOLUTION

Accepted Solutions

Pholo
Contributor

Hi, maybe you can use the date_format function and retrieve hour and minutes as string

import  pyspark.sql.functions as F
aapl_table.filter(
    F.date_format(F.col('AAPL_Time '),'HHmm').between("0930","1630")
  )

This is the version in pyspark, if you want to do in spark sql you can try

display(spark.sql("SELECT Aapl_Time, AApl_Close FROM aapl_table where DATE_FORMAT(Aapl_Time,'HHmm') between '0930' and '1630'"))

Let me know if it helped.

View solution in original post

4 REPLIES 4

Pholo
Contributor

Hi, maybe you can use the date_format function and retrieve hour and minutes as string

import  pyspark.sql.functions as F
aapl_table.filter(
    F.date_format(F.col('AAPL_Time '),'HHmm').between("0930","1630")
  )

This is the version in pyspark, if you want to do in spark sql you can try

display(spark.sql("SELECT Aapl_Time, AApl_Close FROM aapl_table where DATE_FORMAT(Aapl_Time,'HHmm') between '0930' and '1630'"))

Let me know if it helped.

Dicer
Valued Contributor

It works. Thank you!

I'm glad that It helped 😃

Anonymous
Not applicable

Another thing to try is the hour() and minute() functions will return integers.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group