cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL Server OUTPUT clause alternative

Nathant93
New Contributor III

I am looking at after a merge or insert has happened to get the records in that batch that had been inserted via either method, much like the OUTPUT clause in sql server.

Does anyone have any suggestions, the only thing I can think of is to add a timestamp to the records and then select them from the table, however that would require having a timestamp on all tables I have.

Thanks

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Nathant93, Certainly! Handling record changes after a MERGE or INSERT operation can be crucial for tracking and auditing.

 

Letโ€™s explore a few approaches:

 

Timestamp Approach:

  • As you mentioned, adding a timestamp column to your records is a straightforward way to track changes. However, it does require modifying your existing tables.
  • If you choose this approach, you can select the records based on the timestamp column to identify the newly inserted or updated rows.

Change Data Feed (Delta Table):

  • You can enable the Change Data Feed feature if youโ€™re using a Delta table (available in Databricks or Delta Lake).
  • By setting delta.enableChangeDataFeed = true, youโ€™ll receive information about rows that have changed (inserted, updated, or deleted) in the table.
  • You can then process this feed using another stream or batch job to track the changes.

Row Count Tracking:

  • If you specifically need row counts for different actions (inserts, updates, and new inserts), consider the following approach:
    • After your MERGE statement, insert the appropriate rows into a temporary table (similar to your existing process).
    • Use the $action column (which indicates the type of action: INSERT, UPDATE, or DELETE) to tally the contents of the temporary table by operation type.
    • This way, youโ€™ll have separate counts for each action (inserts, updates, and new inserts).

Remember to adapt this example to your specific schema and requirements. The key is using the $action column and a temporary table to track the actions performed during the MERGE operation.

 

Feel free to adjust the approach based on your environment and constraints. ๐Ÿ˜Š

Nathant93
New Contributor III

I've managed to do it like this 

qry = spark.sql(f"DESCRIBE history <table_name> limit 1").collect()
current_version = int(qry[0][0])
prev_version = current_version - 1
Then do an except statement between the versions.
 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!