<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Delta sharing json predicate doesn't work in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-sharing-json-predicate-doesn-t-work/m-p/127125#M47864</link>
    <description>&lt;P&gt;I'm trying to push predicates via python &lt;STRONG&gt;delta_sharing pkg:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://github.com/delta-io/delta-sharing/tree/main/python/delta_sharing" target="_blank"&gt;https://github.com/delta-io/delta-sharing/tree/main/python/delta_sharing&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;and delta sharing protocol:&amp;nbsp;&lt;/STRONG&gt;&lt;A href="https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#json-predicates-for-filtering" target="_blank"&gt;https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#json-predicates-for-filtering&lt;/A&gt;&lt;BR /&gt;And get not all rows from the table. As far as I understand in my delta table there are 8 parquet files, which are in json delta log with all needed statistics.&lt;BR /&gt;From 8 files only 1 contains my data that I push in predicate to filter:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Min Values: {'PropertyID': '1072897', 'RespondentID': 60481680, 'RespondentName': '', 'RespondentEmail': '', 'DateResponded': '2021-02-02T10:30:21.000Z', 'Type': 'Design', 'QuestionID': 839, 'QuestionType': 'Categorical', 'ResponseValue': '0', 'VerbatimText': ''}
▪️ Max Values: {'PropertyID': '1203209', 'RespondentID': 68543688, 'RespondentName': 'QWERT', 'RespondentEmail': 'qqq@qqq.com', 'DateResponded': '2021-11-03T12:17:57.000Z', 'Type': 'Work Order', 'QuestionID': 12383, 'QuestionType': 'Text', 'ResponseValue': '95',  'VerbatimText': 'wouldnt really respond to us '}&lt;/LI-CODE&gt;&lt;P&gt;Here is my code with json predicate:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;import delta_sharing
import json

date_responded = json.dumps([{
        "op": "equal",
        "children": [
            {"op": "column", "name": "DateResponded", "valueType": "timestamp"},
            {"op": "literal", "value": "2021-08-05T19:20:02.000Z", "valueType": "timestamp"}
        ]
    }])

config_file = "D:\config.share"
client = delta_sharing.SharingClient(config_file)

shares = client.list_shares()
share = shares[0]
schemas = client.list_schemas(share)
schema = schemas[0]
tables = client.list_tables(schema)
table = tables[0]
table_url = f"{config_file}#{share.name}.{schema.name}.{table.name}"

df = delta_sharing.load_as_pandas(table_url, use_delta_format=True, jsonPredicateHints=date_responded)

print(f"Table count: {len(df.index)}")&lt;/LI-CODE&gt;&lt;P&gt;Also I tried to pass as in the example:&amp;nbsp;&lt;A href="https://github.com/delta-io/delta-sharing" target="_blank"&gt;https://github.com/delta-io/delta-sharing&lt;/A&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    date_responded = '''{
      "op": "equal",
      "children": [
        {"op": "column", "name":"DateResponded", "valueType":"timestamp"},
        {"op":"literal","value":"2021-08-05T19:20:02.000Z","valueType":"timestamp"}
      ]
    }'''&lt;/LI-CODE&gt;&lt;P&gt;But the result always the same - total count I printed is:&lt;BR /&gt;Table count: 275744 rows - the total count of the whole table 8 parquet files&lt;BR /&gt;But it should be 11 rows, at least id delta sharing server returns 1 file it should be like in the file&amp;nbsp;33354 that contains my data:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;File: part-00001-3762ca0c-0967-438b-9af2-58acdc1d35ca-c000.snappy.parquet
   ▪️ Num Records: 33354&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Also I ran in debug mode, but dont see the body with my json predicate, just post request to the delta sharing server url:&lt;BR /&gt;&lt;STRONG&gt;DEBUG:urllib3.connectionpool:&lt;/STRONG&gt;&lt;EM&gt;&lt;A href="https://ireland.cloud.databricks.com:443" target="_blank"&gt;https://ireland.cloud.databricks.com:443&lt;/A&gt; "POST /api/2.0/delta-sharing/metastores/f7391b87-5c08-478b-97ec-5328e578iu89/shares/my_test_share/schemas/deltashares/tables/my_test_table/query HTTP/1.1" 200 None&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I'm doing wrong and how does it work internally when pass predicates?&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Is there any other way not to get all tables data to pandas df?&lt;/STRONG&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 31 Jul 2025 18:34:03 GMT</pubDate>
    <dc:creator>drag7ter</dc:creator>
    <dc:date>2025-07-31T18:34:03Z</dc:date>
    <item>
      <title>Delta sharing json predicate doesn't work</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-sharing-json-predicate-doesn-t-work/m-p/127125#M47864</link>
      <description>&lt;P&gt;I'm trying to push predicates via python &lt;STRONG&gt;delta_sharing pkg:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://github.com/delta-io/delta-sharing/tree/main/python/delta_sharing" target="_blank"&gt;https://github.com/delta-io/delta-sharing/tree/main/python/delta_sharing&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;and delta sharing protocol:&amp;nbsp;&lt;/STRONG&gt;&lt;A href="https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#json-predicates-for-filtering" target="_blank"&gt;https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#json-predicates-for-filtering&lt;/A&gt;&lt;BR /&gt;And get not all rows from the table. As far as I understand in my delta table there are 8 parquet files, which are in json delta log with all needed statistics.&lt;BR /&gt;From 8 files only 1 contains my data that I push in predicate to filter:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Min Values: {'PropertyID': '1072897', 'RespondentID': 60481680, 'RespondentName': '', 'RespondentEmail': '', 'DateResponded': '2021-02-02T10:30:21.000Z', 'Type': 'Design', 'QuestionID': 839, 'QuestionType': 'Categorical', 'ResponseValue': '0', 'VerbatimText': ''}
▪️ Max Values: {'PropertyID': '1203209', 'RespondentID': 68543688, 'RespondentName': 'QWERT', 'RespondentEmail': 'qqq@qqq.com', 'DateResponded': '2021-11-03T12:17:57.000Z', 'Type': 'Work Order', 'QuestionID': 12383, 'QuestionType': 'Text', 'ResponseValue': '95',  'VerbatimText': 'wouldnt really respond to us '}&lt;/LI-CODE&gt;&lt;P&gt;Here is my code with json predicate:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;import delta_sharing
import json

date_responded = json.dumps([{
        "op": "equal",
        "children": [
            {"op": "column", "name": "DateResponded", "valueType": "timestamp"},
            {"op": "literal", "value": "2021-08-05T19:20:02.000Z", "valueType": "timestamp"}
        ]
    }])

config_file = "D:\config.share"
client = delta_sharing.SharingClient(config_file)

shares = client.list_shares()
share = shares[0]
schemas = client.list_schemas(share)
schema = schemas[0]
tables = client.list_tables(schema)
table = tables[0]
table_url = f"{config_file}#{share.name}.{schema.name}.{table.name}"

df = delta_sharing.load_as_pandas(table_url, use_delta_format=True, jsonPredicateHints=date_responded)

print(f"Table count: {len(df.index)}")&lt;/LI-CODE&gt;&lt;P&gt;Also I tried to pass as in the example:&amp;nbsp;&lt;A href="https://github.com/delta-io/delta-sharing" target="_blank"&gt;https://github.com/delta-io/delta-sharing&lt;/A&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    date_responded = '''{
      "op": "equal",
      "children": [
        {"op": "column", "name":"DateResponded", "valueType":"timestamp"},
        {"op":"literal","value":"2021-08-05T19:20:02.000Z","valueType":"timestamp"}
      ]
    }'''&lt;/LI-CODE&gt;&lt;P&gt;But the result always the same - total count I printed is:&lt;BR /&gt;Table count: 275744 rows - the total count of the whole table 8 parquet files&lt;BR /&gt;But it should be 11 rows, at least id delta sharing server returns 1 file it should be like in the file&amp;nbsp;33354 that contains my data:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;File: part-00001-3762ca0c-0967-438b-9af2-58acdc1d35ca-c000.snappy.parquet
   ▪️ Num Records: 33354&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Also I ran in debug mode, but dont see the body with my json predicate, just post request to the delta sharing server url:&lt;BR /&gt;&lt;STRONG&gt;DEBUG:urllib3.connectionpool:&lt;/STRONG&gt;&lt;EM&gt;&lt;A href="https://ireland.cloud.databricks.com:443" target="_blank"&gt;https://ireland.cloud.databricks.com:443&lt;/A&gt; "POST /api/2.0/delta-sharing/metastores/f7391b87-5c08-478b-97ec-5328e578iu89/shares/my_test_share/schemas/deltashares/tables/my_test_table/query HTTP/1.1" 200 None&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I'm doing wrong and how does it work internally when pass predicates?&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Is there any other way not to get all tables data to pandas df?&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jul 2025 18:34:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-sharing-json-predicate-doesn-t-work/m-p/127125#M47864</guid>
      <dc:creator>drag7ter</dc:creator>
      <dc:date>2025-07-31T18:34:03Z</dc:date>
    </item>
    <item>
      <title>Re: Delta sharing json predicate doesn't work</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-sharing-json-predicate-doesn-t-work/m-p/127129#M47866</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103503"&gt;@drag7ter&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;PredicateHints are just hints, not enforced filters. They&amp;nbsp;do not guarantee that the returned data will be filtered.&lt;/P&gt;&lt;P&gt;Check below thread for detailed discussion. Also check if your delta server has evaluate predicate hints flag set to true&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/t5/data-governance/filtering-partitioned-data-in-databricks-delta-share/td-p/126466" target="_blank"&gt;https://community.databricks.com/t5/data-governance/filtering-partitioned-data-in-databricks-delta-share/td-p/126466&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jul 2025 19:28:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-sharing-json-predicate-doesn-t-work/m-p/127129#M47866</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-31T19:28:06Z</dc:date>
    </item>
    <item>
      <title>Re: Delta sharing json predicate doesn't work</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-sharing-json-predicate-doesn-t-work/m-p/127138#M47868</link>
      <description>&lt;P&gt;thx, for an explanation, but I still don't understand why do we need&amp;nbsp;&lt;STRONG&gt;jsonPredicateHints &amp;nbsp;&lt;/STRONG&gt;if there is no guarantee that delta sharing server returns less files, it is like a random decision made by server. In my case I want to reduce number of files transferred via network. But it doesn't work on huge table and Pandas df fails with OOM error.&lt;/P&gt;&lt;P&gt;Also I'm using internal delta sharing server in databricks account, not deployed one by myself. It is not visible for me as far as I understand and I'm not able to set&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;STRONG&gt;evaluatePredicateHints&lt;/STRONG&gt; option to &lt;STRONG&gt;true&lt;/STRONG&gt; in &lt;STRONG&gt;databricks UI&lt;/STRONG&gt;?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jul 2025 21:09:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-sharing-json-predicate-doesn-t-work/m-p/127138#M47868</guid>
      <dc:creator>drag7ter</dc:creator>
      <dc:date>2025-07-31T21:09:23Z</dc:date>
    </item>
  </channel>
</rss>

