cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Server OUTPUT clause alternative

Nathant93
New Contributor III

I am looking at after a merge or insert has happened to get the records in that batch that had been inserted via either method, much like the OUTPUT clause in sql server.

Does anyone have any suggestions, the only thing I can think of is to add a timestamp to the records and then select them from the table, however that would require having a timestamp on all tables I have.

Thanks

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Nathant93, Certainly! Handling record changes after a MERGE or INSERT operation can be crucial for tracking and auditing.

 

Let’s explore a few approaches:

 

Timestamp Approach:

  • As you mentioned, adding a timestamp column to your records is a straightforward way to track changes. However, it does require modifying your existing tables.
  • If you choose this approach, you can select the records based on the timestamp column to identify the newly inserted or updated rows.

Change Data Feed (Delta Table):

  • You can enable the Change Data Feed feature if you’re using a Delta table (available in Databricks or Delta Lake).
  • By setting delta.enableChangeDataFeed = true, you’ll receive information about rows that have changed (inserted, updated, or deleted) in the table.
  • You can then process this feed using another stream or batch job to track the changes.

Row Count Tracking:

  • If you specifically need row counts for different actions (inserts, updates, and new inserts), consider the following approach:
    • After your MERGE statement, insert the appropriate rows into a temporary table (similar to your existing process).
    • Use the $action column (which indicates the type of action: INSERT, UPDATE, or DELETE) to tally the contents of the temporary table by operation type.
    • This way, you’ll have separate counts for each action (inserts, updates, and new inserts).

Remember to adapt this example to your specific schema and requirements. The key is using the $action column and a temporary table to track the actions performed during the MERGE operation.

 

Feel free to adjust the approach based on your environment and constraints. 😊

Nathant93
New Contributor III

I've managed to do it like this 

qry = spark.sql(f"DESCRIBE history <table_name> limit 1").collect()
current_version = int(qry[0][0])
prev_version = current_version - 1
Then do an except statement between the versions.
 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group