<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic filter push down into redis when querying using spark connector in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/filter-push-down-into-redis-when-querying-using-spark-connector/m-p/17520#M11537</link>
    <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;I'm loading df from redis using this code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = (spark.read.format("org.apache.spark.sql.redis")
        .option("table", f"state_store_ready_to_sell")
        .option("key.column", "msid").option("infer.schema", "true").load()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;and then i'm running filter , for example:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ready_to_sell = df.filter("msid in ('12321','12432')")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I looked at spark plan and spark does not push the msid filter to redis.&lt;/P&gt;&lt;P&gt;Which means that all redis records are loaded and filtered on spark memory (according to the sql tab is spark ui)&lt;/P&gt;&lt;P&gt;msid is key.column in redis of course.&lt;/P&gt;&lt;P&gt;How do i make spark push down the filter the fetch only the relevant records?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;Almog&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 20 Jul 2021 09:41:50 GMT</pubDate>
    <dc:creator>almogg</dc:creator>
    <dc:date>2021-07-20T09:41:50Z</dc:date>
    <item>
      <title>filter push down into redis when querying using spark connector</title>
      <link>https://community.databricks.com/t5/data-engineering/filter-push-down-into-redis-when-querying-using-spark-connector/m-p/17520#M11537</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;I'm loading df from redis using this code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = (spark.read.format("org.apache.spark.sql.redis")
        .option("table", f"state_store_ready_to_sell")
        .option("key.column", "msid").option("infer.schema", "true").load()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;and then i'm running filter , for example:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ready_to_sell = df.filter("msid in ('12321','12432')")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I looked at spark plan and spark does not push the msid filter to redis.&lt;/P&gt;&lt;P&gt;Which means that all redis records are loaded and filtered on spark memory (according to the sql tab is spark ui)&lt;/P&gt;&lt;P&gt;msid is key.column in redis of course.&lt;/P&gt;&lt;P&gt;How do i make spark push down the filter the fetch only the relevant records?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;Almog&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jul 2021 09:41:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/filter-push-down-into-redis-when-querying-using-spark-connector/m-p/17520#M11537</guid>
      <dc:creator>almogg</dc:creator>
      <dc:date>2021-07-20T09:41:50Z</dc:date>
    </item>
  </channel>
</rss>

