SQL Scripting - Procedural Logic in Databricks SQL (Without Leaving SQL)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
15 hours ago
We've all been there. You need to loop through a list of tables, apply some transformation, handle a few edge cases, and maybe catch an error or two. In most data platforms, that means switching to Python or Scala—even if 90% of your logic is just SQL anyway.
SQL Scripting solves this exact problem, and it's a genuine productivity boost for SQL-first engineers. I've been using it for admin tasks and ELT jobs, and I'm not going back.
Prerequisites
- Databricks workspace (any cloud)
- A SQL Warehouse or cluster with DBR 14.3+
- Basic SQL knowledge (you don't need to be a PL/SQL veteran)
What SQL Scripting Gives You
SQL Scripting follows the ANSI SQL/PSM standard, which means the skills transfer if you've ever worked with T-SQL, PL/pgSQL, or similar. Here's the quick rundown:
- Local variables with DECLARE
- Control flow: IF/THEN/ELSE, CASE statements
- Four loop types: FOR, WHILE, REPEAT, and plain LOOP
- Dynamic SQL via EXECUTE IMMEDIATE
- Exception handling with condition handlers
Everything runs inside a BEGIN...END compound statement.
A Real Example: Bulk-Updating Column Collations
Say you've inherited a schema where every STRING column uses default collation, and now you need case-insensitive sorting across the board. Manually altering 50 tables? No thanks.
Here's how it can be done with SQL Scripting: ( sample code - not tested).
BEGIN
-- Declare variables first, then logic
DECLARE v_table_name STRING DEFAULT 'employees';
DECLARE v_collation STRING DEFAULT 'UTF8_LCASE';
DECLARE v_columns ARRAY<STRING> DEFAULT ARRAY();
-- Set the table's default collation for new columns
EXECUTE IMMEDIATE
'ALTER TABLE ' || v_table_name || ' SET DEFAULT COLLATION ' || v_collation;
-- Loop through existing STRING columns and alter each one
FOR col IN (
SELECT column_name
FROM information_schema.columns
WHERE table_name = v_table_name
AND data_type = 'STRING'
) DO
EXECUTE IMMEDIATE
'ALTER TABLE ' || v_table_name ||
' ALTER COLUMN ' || col.column_name ||
' SET COLLATION ' || v_collation;
-- Track which columns we modified
SET v_columns = array_append(v_columns, col.column_name);
END FOR;
-- Refresh stats if we changed anything
IF size(v_columns) > 0 THEN
EXECUTE IMMEDIATE
'ANALYZE TABLE ' || v_table_name ||
' COMPUTE STATISTICS FOR COLUMNS ' ||
array_join(v_columns, ', ');
END IF;
END;
The FOR loop queries information_schema.columns and iterates row by row. EXECUTE IMMEDIATE handles the dynamic DDL. Simple.
Scaling to an Entire Schema
Want to hit every table in a schema? Nest another loop:
BEGIN
DECLARE v_schema STRING DEFAULT 'my_schema';
DECLARE v_collation STRING DEFAULT 'UTF8_LCASE';
DECLARE v_skipped ARRAY<STRING> DEFAULT ARRAY();
FOR tbl IN (
SELECT table_name
FROM information_schema.tables
WHERE table_schema = v_schema AND table_type = 'BASE TABLE'
) DO
BEGIN
DECLARE v_columns ARRAY<STRING> DEFAULT ARRAY();
-- Handler: if ALTER fails (e.g., on a view), log and continue
DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
SET v_skipped = array_append(v_skipped, tbl.table_name);
EXECUTE IMMEDIATE
'ALTER TABLE ' || v_schema || '.' || tbl.table_name ||
' SET DEFAULT COLLATION ' || v_collation;
FOR col IN (
SELECT column_name
FROM information_schema.columns
WHERE table_schema = v_schema
AND table_name = tbl.table_name
AND data_type = 'STRING'
) DO
EXECUTE IMMEDIATE
'ALTER TABLE ' || v_schema || '.' || tbl.table_name ||
' ALTER COLUMN ' || col.column_name ||
' SET COLLATION ' || v_collation;
SET v_columns = array_append(v_columns, col.column_name);
END FOR;
END;
END FOR;
SELECT v_skipped AS skipped_objects;
END;
The DECLARE CONTINUE HANDLER FOR SQLEXCEPTION catches any errors (like trying to alter a view), logs the object name, and moves on. No more script crashes halfway through.
Loop Types Cheat Sheet
| Loop | When to Use |
|---|---|
| FOR query DO...END FOR | Iterating over query results (most common) |
| WHILE cond DO...END WHILE | Pre-condition check, might run zero times |
| REPEAT...UNTIL cond END REPEAT | Always runs at least once |
| LOOP...END LOOP | Infinite loop—break with LEAVE |
Use LEAVE to exit a loop early and ITERATE to skip to the next iteration (like continue in Python).
Gotchas & Pro Tips
- Variables must be declared first. All DECLARE statements come before any logic in a BEGIN...END block.
- FOR loops don't create a new scope. If you need a fresh variable per iteration, wrap the body in a nested BEGIN...END.
- Combine with AI functions. You can call ai_fix_grammar() or ai_classify() inside loops for bulk data cleansing. Just watch the API costs.
Wrap Up
SQL Scripting isn't trying to replace Python for complex ML pipelines. But for admin automation, schema migrations, and ELT logic? It's exactly what SQL-first engineers have been waiting for.
Give it a shot on your next bulk update job. You might be surprised how much you can do without switching languages.
- Labels:
-
Sql Scripts
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
14 hours ago
Correction: Preview is available on Databricks Runtime 16.3 and above.