Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2024 12:49 PM
Borrowed from LinkedIn, here is a SQL query you can use to compare two tables (or dataframes)
with
hash_src as ( select hash(*) as hash_val from my.source.table ),
hash_tgt as ( select hash(*) as hash_val from my.target.table )
select sum(hash_val) ^ AVG(hash_val)::int ^ MIN(hash_val) ^ MAX(hash_val) as hash_val from hash_src
union
select sum(hash_val) ^ AVG(hash_val)::int ^ MIN(hash_val) ^ MAX(hash_val) as hash_val from hash_tgt
If you get one row back ... the tables are the same.
If you get two rows back ... they're different.