cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL Scripting - Procedural Logic in Databricks SQL (Without Leaving SQL)

AbhaySingh
Databricks Employee
Databricks Employee

We've all been there. You need to loop through a list of tables, apply some transformation, handle a few edge cases, and maybe catch an error or two. In most data platforms, that means switching to Python or Scalaโ€”even if 90% of your logic is just SQL anyway.

SQL Scripting solves this exact problem, and it's a genuine productivity boost for SQL-first engineers. I've been using it for admin tasks and ELT jobs, and I'm not going back.

Prerequisites

  • Databricks workspace (any cloud)
  • A SQL Warehouse or cluster with DBR 14.3+
  • Basic SQL knowledge (you don't need to be a PL/SQL veteran)

What SQL Scripting Gives You

SQL Scripting follows the ANSI SQL/PSM standard, which means the skills transfer if you've ever worked with T-SQL, PL/pgSQL, or similar. Here's the quick rundown:

  • Local variables with DECLARE
  • Control flow: IF/THEN/ELSE, CASE statements
  • Four loop types: FOR, WHILE, REPEAT, and plain LOOP
  • Dynamic SQL via EXECUTE IMMEDIATE
  • Exception handling with condition handlers

Everything runs inside a BEGIN...END compound statement.

A Real Example: Bulk-Updating Column Collations

Say you've inherited a schema where every STRING column uses default collation, and now you need case-insensitive sorting across the board. Manually altering 50 tables? No thanks.

Here's how it can be done with SQL Scripting: ( sample code - not tested). 

BEGIN
  -- Declare variables first, then logic
  DECLARE v_table_name STRING DEFAULT 'employees';
  DECLARE v_collation STRING DEFAULT 'UTF8_LCASE';
  DECLARE v_columns ARRAY<STRING> DEFAULT ARRAY();
  
  -- Set the table's default collation for new columns
  EXECUTE IMMEDIATE 
    'ALTER TABLE ' || v_table_name || ' SET DEFAULT COLLATION ' || v_collation;
  
  -- Loop through existing STRING columns and alter each one
  FOR col IN (
    SELECT column_name 
    FROM information_schema.columns 
    WHERE table_name = v_table_name 
      AND data_type = 'STRING'
  ) DO
    EXECUTE IMMEDIATE 
      'ALTER TABLE ' || v_table_name || 
      ' ALTER COLUMN ' || col.column_name || 
      ' SET COLLATION ' || v_collation;
    
    -- Track which columns we modified
    SET v_columns = array_append(v_columns, col.column_name);
  END FOR;
  
  -- Refresh stats if we changed anything
  IF size(v_columns) > 0 THEN
    EXECUTE IMMEDIATE 
      'ANALYZE TABLE ' || v_table_name || 
      ' COMPUTE STATISTICS FOR COLUMNS ' || 
      array_join(v_columns, ', ');
  END IF;
END;

The FOR loop queries information_schema.columns and iterates row by row. EXECUTE IMMEDIATE handles the dynamic DDL. Simple.

Scaling to an Entire Schema

Want to hit every table in a schema? Nest another loop:

BEGIN
  DECLARE v_schema STRING DEFAULT 'my_schema';
  DECLARE v_collation STRING DEFAULT 'UTF8_LCASE';
  DECLARE v_skipped ARRAY<STRING> DEFAULT ARRAY();
  
  FOR tbl IN (
    SELECT table_name 
    FROM information_schema.tables 
    WHERE table_schema = v_schema AND table_type = 'BASE TABLE'
  ) DO
    BEGIN
      DECLARE v_columns ARRAY<STRING> DEFAULT ARRAY();
      
      -- Handler: if ALTER fails (e.g., on a view), log and continue
      DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
        SET v_skipped = array_append(v_skipped, tbl.table_name);
      
      EXECUTE IMMEDIATE 
        'ALTER TABLE ' || v_schema || '.' || tbl.table_name || 
        ' SET DEFAULT COLLATION ' || v_collation;
      
      FOR col IN (
        SELECT column_name 
        FROM information_schema.columns 
        WHERE table_schema = v_schema 
          AND table_name = tbl.table_name 
          AND data_type = 'STRING'
      ) DO
        EXECUTE IMMEDIATE 
          'ALTER TABLE ' || v_schema || '.' || tbl.table_name || 
          ' ALTER COLUMN ' || col.column_name || 
          ' SET COLLATION ' || v_collation;
        SET v_columns = array_append(v_columns, col.column_name);
      END FOR;
      
    END;
  END FOR;
  
  SELECT v_skipped AS skipped_objects;
END;

The DECLARE CONTINUE HANDLER FOR SQLEXCEPTION catches any errors (like trying to alter a view), logs the object name, and moves on. No more script crashes halfway through.

Loop Types Cheat Sheet

Loop When to Use
FOR query DO...END FOR Iterating over query results (most common)
WHILE cond DO...END WHILE Pre-condition check, might run zero times
REPEAT...UNTIL cond END REPEAT Always runs at least once
LOOP...END LOOP Infinite loopโ€”break with LEAVE

Use LEAVE to exit a loop early and ITERATE to skip to the next iteration (like continue in Python).

Gotchas & Pro Tips

  1. Variables must be declared first. All DECLARE statements come before any logic in a BEGIN...END block.
  2. FOR loops don't create a new scope. If you need a fresh variable per iteration, wrap the body in a nested BEGIN...END.
  3. Combine with AI functions. You can call ai_fix_grammar() or ai_classify() inside loops for bulk data cleansing. Just watch the API costs.

Wrap Up

SQL Scripting isn't trying to replace Python for complex ML pipelines. But for admin automation, schema migrations, and ELT logic? It's exactly what SQL-first engineers have been waiting for.

Give it a shot on your next bulk update job. You might be surprised how much you can do without switching languages.

1 REPLY 1

AbhaySingh
Databricks Employee
Databricks Employee

Correction: Preview is available on Databricks Runtime 16.3 and above.