cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

query based connector snapshot feature

manish_de
New Contributor

In ingestion pipeline, for query based connector there is option of selecting batch snapshot instead of column name under dropdown - Cursor column. If I choose batch snapshot, will the databricks engine run select * from my source table, say Sql server as per job schedule. Or does it do something intelligent use some where clause and avoid selecting the whole table 



4 REPLIES 4

amirabedhiafi
New Contributor III

Hi @manish_de !

For query based lakeflow connect, DBKS is not doing CDC magic unless you provide a valid cursor column.

If you select batch snapshot instead of a cursor column, you should treat it as a full snapshot or full load each scheduled run.

In practice, DBKS needs to read the source result set again from SQL Server then apply or merge the result into the destination. It may not literally issue SELECT * in every case because it can project configured columns but conceptually it is scanning the source query without an incremental WHERE cursor > last_watermark filter.

When a cursor column is configured on each run DBKS retrieves rows where the cursor value is greater than the previously stored high water mark and if you do not select a monotonically increasing cursor column the connector performs a full load on each run. 

Think of it like batch snapshot = full source snapshot per run.

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

manish_de
New Contributor

@amirabedhiafi  I just wanted to double check,  there will be no where clause in the sql sent everytime to my database for batch snapshot? 

Yes ! because with batch snapshot DBKS should be treated as doing a full snapshot read each run and won't automatically add  WHERE to avoid scanning the whole source table. 

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

michaelfriendly
New Contributor

From what Iโ€™ve seen, choosing batch snapshot usually means the connector treats each run like a full snapshot load, so yes, it may execute something very close to a `SELECT *` on the source table unless the platform adds its own partitioning or optimization behind the scenes. If no cursor column or incremental filter is defined, there generally isnโ€™t a smart `WHERE` clause to reduce reads, so checking the generated rbtv query logs in Databricks or SQL Server profiler would give the clearest answer.