Combining SIGNAL statement with ATOMIC transactions in Databricks saves us from managing commits & rollbacks along with managing custom validations seamlessly - something that modern big data ETL frameworks struggle to deliver cleanly. They give the ability to raise meaningful exceptions from the pipeline logic and commit/roll back is managed automatically. You can validate data quality via business rule validations and take actions via custom SIGNALS.
SIGNAL is a SQL statement that allows us to deliberately raise user defined exceptions from within the code logic. Itโs a guard rail inside the SQL pipeline code to keep the tables healthy. The SQL STATE code provided in the code 45000 is returned to the calling tool, and MESSAGE TEXT allows embedding dynamic values - balances, states, account IDs directly into the error message for taking easy actions.
We shall look at three cases - Financial Transaction Enforcement, ETL staging validation and Data Quality checks showing how to wire it up.
Case 1 โ Financial Guard
When a debit is applied to an account in a bank, a hard rule needs to be enforced (balance must not fall below zero) for customer transactions. With SIGNAL, you can embed the logic directly where the change happens instead of handling it via custom code in various notebooks.
BEGIN ATOMIC
DECLARE v_remaining_balance INT;
UPDATE accounts SET balance = balance - 1100 WHERE account_id = 101;
SET v_remaining_balance = (SELECT balance FROM accounts WHERE account_id = 101);
-- Cancel if the balance is below zero after deduction
IF v_remaining_balance < 0 THEN
SIGNAL SQLSTATE '45042' SET MESSAGE_TEXT = CONCAT('Insufficient balance to fulfil for account 101 ','Requested: 1100. ', 'Balance would fall to: ', CAST(v_remaining_balance AS STRING), ' units. ','Add balance before retrying.');
END IF;
ENDCase 2 โ ETL Staging Validation
Data pipelines that load data from raw into staging tables face a tricky class of failure - job runs cleanly but expected data has not arrived. Without an explicit check, this results in a silent operational blind spot that might not surface until a downstream report shows a dip. With Transactions & SIGNAL, the pipeline that loads data into staging, validates expected row count and merges to the core table only if the expected data is available and the entire code block rolls back atomically in case of gaps.
BEGIN ATOMIC
-- Clear old staging data
DELETE FROM staging_accounts;
-- Load new data
INSERT INTO staging_accounts SELECT * FROM raw_accounts WHERE load_date = current_date() - INTERVAL 1 DAY;
-- Validate count & fail transaction if no data
IF (SELECT COUNT(*) FROM staging_accounts) = 0 THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'No accounts data loaded for yesterday';
END IF;
-- Merge into core
MERGE INTO accounts AS target
USING staging_accounts AS source
ON target.account_id = source.account_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
ENDCase 3 - ETL โ Data Quality Validation
Data Quality checks are necessary to ensure dashboards display trustworthy data. A pipeline may load 50000 rows into a table with 5000 empty or invalid account id corrupting the table & dashboards. This case extends the staging approach with a data quality validation. After loading, we validate the rows for a business rule (invalid account IDs) and block the insertion if invalid data is found. The SIGNAL fires with a clear description of the business quality failure and the core table stays clean because of the ATOMIC transactions.
BEGIN ATOMIC
-- Clear old staging data
DELETE FROM staging_accounts;
-- Load new data
INSERT INTO staging_accounts SELECT * FROM raw_accounts WHERE load_date = current_date() - INTERVAL 1 DAY;
-- Validate data quality
IF (SELECT COUNT(*) FROM staging_accounts where account_id = โโ ) > 0 THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid accounts loaded';
END IF;
-- Merge into core
MERGE INTO accounts AS target
USING staging_accounts AS source
ON target.account_id = source.account_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
END