cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Sharing & UC: Understanding the Initial Empty Predicate Query

mooze456
New Contributor

We're testing our Delta Sharing server with Unity Catalog (UC) and noticed a behavior where a simple query like SELECT COUNT(1) FROM table_name WHERE col1 = 'value' triggers two /query requests to our server.

The initial request arrives with empty predicateHints and limitHints. We suspect this is to fetch the table's schema and basic metadata, allowing the client to understand the data structure before applying filters. The subsequent request includes the predicateHints from our WHERE clause, which is the actual data retrieval query.

Our question is regarding the necessity of this first request. While it might seem redundant for data retrieval, it likely plays a role in client-side query planning and optimization by providing upfront schema information. Therefore, ignoring the first /query request is generally not recommended, as it could disrupt the client's ability to correctly process subsequent queries.

Are there specific issues or performance concerns driving the desire to ignore this initial request? Understanding the context might help in exploring alternative solutions.

1 REPLY 1

BigRoux
Databricks Employee
Databricks Employee
The initial /query request during a Delta Sharing operation with Unity Catalog serves a critical purpose in the query lifecycle. It is intended to retrieve the schema and basic metadata of the table, which helps in query planning and optimization. This metadata includes information about the table's structure and available columns, allowing the client to correctly interpret and construct subsequent queries, such as those with filters or aggregations. For example, metadata fetching enables the system to determine which columns are relevant for filtering, how to apply predicate pushdowns, and to build query execution plans that are efficient.
 
Ignoring this initial request could result in significant disruptions, as it may prevent the client from acquiring the necessary metadata needed for subsequent queries or could lead to errors where query execution assumes incorrect schema details due to the absence of prior metadata validation. This behavior aligns with standard database management practices where schema resolution is a prerequisite for accurate query planning.
 
Further, the need for this initial request is accentuated in scenarios involving advanced Delta Sharing features, such as dynamic filtering, lineage tracking, and integration with governance tools like Unity Catalog. These operations rely on a correct understanding of the table properties to enforce access controls and ensure compliance.
 
From a performance perspective, while the first request may add an additional roundtrip, it is not designed to dominate query execution time. Network roundtrips for basic metadata should typically be minimal compared to the overall query latency for substantial datasets, especially when dealing with remote or shared environments.
 
Hope this helps, Lou.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now