cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Filtering Partitioned Data in Databricks Delta Share

Col1ns
New Contributor

We have a Delta Share that includes a partitioned table, and we want our recipient to be able to retrieve data only from a specific partition.

I reviewed the Delta Sharing server documentation and found that it's possible to use SQL expressions for filtering. I tested this via the API using the following request body, but the response still returns all data, not just the data from the specified partition.

My question is: Are SQL expressions for filtering enabled in Databricks Delta Sharing? If so, why does my request return all the data instead of applying the filter?

Are there any other supported ways to filter data retrieved from a Delta Share?

{
  "predicateHints": [
    "PartitionColumn = 'partitionName'"
  ],
  "limitHint": 100
}

2 REPLIES 2

lingareddy_Alva
Honored Contributor III

Hi @Col1ns 

Predicate hints are not filters - they are optimization hints only. The predicateHints field you're using tells the Delta Sharing server about likely filter conditions so it can optimize data transfer, but it doesn't actually filter the data.
This is why you're still receiving all the data despite specifying the predicate hint.

1. predicateHints are just hints, not enforced filters:
- These do not guarantee that the returned data will be filtered.
- They're passed along to the data provider to optimize data transmission, but it's ultimately up to the provider's server (in this case, Databricks) whether or how strictly to apply them.
- If the Delta table is not partitioned on the column you’re filtering, the hint may be ignored.
2. Filtering is applied at file level, not row level:
- Delta Sharing shares parquet files, and predicateHints only help the server select relevant files to send.
- If a file contains both relevant and irrelevant rows, the entire file is still shared. The recipient must apply filtering on their end.

3. Databricks Delta Sharing server doesn’t enforce strict predicate filtering:
- This is by design for performance and security reasons.
- Filtering is a best-effort optimization, not an access control feature.

 

 

LR

szymon_dybczak
Esteemed Contributor III

Hi @Col1ns ,

All the things mentioned by @lingareddy_Alva are correct. Some additional things to consider. Check if your implementation of Delta Sharing supports predicate hints. 
Also, if your server supports them then your predicate should use restricted SQL Expression for filtering. Below expression are permitted:

Expression Example

=col = 123
>col > 'foo'
<'foo' < col
>=col >= 123
<=123 <= col
<>col <> 'foo'
IS NULLcol IS NULL
IS NOT NULLcol IS NOT NULL

Also, predicateHints will be deprecated once all the client and server implementation move to using jsonPredicateHints. So, from now on you should try to use jsonPredicateHints. You can read more about it in protocol description:

delta-sharing/PROTOCOL.md at main · delta-io/delta-sharing · GitHub

 

And last by not least, check you delta sharing server configuration. If you want to use this feature, following flag should be set to true:

szymon_dybczak_0-1753684428781.png