I'm using the Redshift data source to load data into spark SQL data frames. However, I'm not seeing predicate push down for my queries ran on Redshift - is that expected?

sajith_appukutt
Databricks Employee
Databricks Employee

I was expecting filter operations to be pushed down to Redshift by the optimizer. However, the entire dataset is getting loaded from Redshift.

sajith_appukutt
Databricks Employee
Databricks Employee

The Spark driver for Redshift pushes the following operators down into Redshift:

  • Filter
  • Project
  • Sort
  • Limit
  • Aggregation
  • Join

However, it does not support expressions operating on dates and timestamps today. If you have a similar requirement, please add a feature request via https://docs.databricks.com/resources/ideas.html

View solution in original post