<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SQL Scripting in Apache Spark™ 4.0 in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/sql-scripting-in-apache-spark-4-0/m-p/137151#M759</link>
    <description>&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;The Apache Spark™ 4.0 introduces a new feature for SQL developers and data engineers: SQL Scripting. As such, this feature enhances the power and extends the flexibility of Spark SQL, enabling users to write procedural code within SQL queries, with the added support of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://en.wikipedia.org/wiki/SQL/PSM" target="_blank" rel="noopener ugc nofollow"&gt;ANSI SQL/PSM&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;scripting language in Spark &lt;span class="lia-unicode-emoji" title=":collision:"&gt;💥&lt;/span&gt;.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;Yes, you can now have control flow and conditional statements, allowing you to execute complex logic, loops, and conditionals directly in your SQL scripts. Whether your workloads involve data pipelines, transformations, or explorations of large datasets, Apache Spark™ 4.0 SQL Scripting extends the power of SQL to manipulate data with ease through procedural programming.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2025-10-31 at 2.43.13 PM.png" style="width: 675px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21253i4FBDC25B15475902/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2025-10-31 at 2.43.13 PM.png" alt="Screenshot 2025-10-31 at 2.43.13 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;&lt;SPAN&gt;There are three primary reasons why this feature will help SQL developers who want to harness the power of Spark.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3 id="6f73" class="abu abv uz as abw abx aby abz mr aca acb acc mw abb acd ace acf abf acg ach aci abj acj ack acl acm co" data-selectable-paragraph=""&gt;1. Familiarity and Portability&lt;/H3&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;ANSI SQL/PSM is a standard and common way to write step-by-step logic in SQL, and many SQL developers already utilize it in databases such as PostgreSQL, Oracle, or SQL Server. By adding support for this in Spark, it becomes significantly easier to migrate existing SQL scripts into Spark without substantial changes. This means that SQL developers and data analysts can fully leverage Spark’s speed and power when working with data in the Lakehouse.&lt;/P&gt;
&lt;H3 id="9168" class="abu abv uz as abw abx aby abz mr aca acb acc mw abb acd ace acf abf acg ach aci abj acj ack acl acm co" data-selectable-paragraph=""&gt;2. Better Control Flow in SQL Pipelines&lt;/H3&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;Standard SQL in Spark is great for simple queries, but it struggles with things like variables, loops, conditions, or more complex logic. The ANSI SQL/PSM scripting adds language constructs like DECLARE, SET, IF-THEN-ELSE, WHILE…END, FOR …DO , REPEAT, CASE, and BEGIN…END, so you can write advanced logic directly in SQL without needing to switch to Scala, Python, or other tools.&lt;/P&gt;
&lt;H3 id="7a64" class="abu abv uz as abw abx aby abz mr aca acb acc mw abb acd ace acf abf acg ach aci abj acj ack acl acm co" data-selectable-paragraph=""&gt;3. Easy Integration&lt;/H3&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;Because ANSI SQL/PSM works seamlessly with Spark SQL, it enables complex logic without switching to another language like Python or Scala. Before, you had to use PySpark to write procedural logic and Spark SQL for queries. With this feature, data analysts can easily use their familiar language to do both, in SQL scripting.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;Let’s examine a simple example code in ANSI SQL/PSM script with an equivalent code in PySpark script. Assume we have a newly created target table&lt;STRONG class="zs hq"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;t.&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# Create an empty DataFrame with the same schema as the target table
# t
df = spark.createDataFrame([], schema="c INT")

# Loop to insert values into the DataFrame
c = 10
while c &amp;gt; 0:
    df = df.union(spark.createDataFrame([(c,)], schema="c INT"))
    c -= 1

# Insert the DataFrame into the target table
df.write.insertInto("t")

# Display the contents of the table
(spark.sql("SELECT * FROM t")).show()

+--+
|c |
+--+
|10|
|9 |
|8 |
|7 |
|6 |
|5 |
+--+&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;And its equivalent ANSI SQL/PSM script:&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="php"&gt;BEGIN
  DECLARE c INT = 10;
  WHILE c &amp;gt; 0 DO
    INSERT INTO t VALUES (c);
    SET c = c - 1;
  END WHILE;
  SELECT * FROM T;
END

+--+
|c |
+--+
|10|
|9 |
|8 |
|7 |
|6 |
|5 |
|4 |
|3 |
|2 |
|1 |
+--+&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;Here is another example of a FOR … LOOP construct in ANS SQL/PSM script.&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;-- Short ANSI SQL/PSM Script with FOR loop calculations
BEGIN
    -- Declare variables
    DECLARE total_salary DECIMAL(12,2) DEFAULT 0;
    DECLARE total_ages INTEGER DEFAULT 0;
    
    -- Create and populate table
    CREATE TABLE jsd_employee (
        name VARCHAR(100),
        age INTEGER,
        salary DECIMAL(10,2),
        department VARCHAR(50)
    );
    
    INSERT INTO jsd_employee VALUES
        ('John Smith', 28, 65000.00, 'Engineering'),
        ('Sarah Johnson', 34, 75000.00, 'Marketing'),
        ('Mike Davis', 42, 85000.00, 'Finance'),
        ('Emily Brown', 29, 58000.00, 'HR');
    
    -- FOR loop to calculate totals
    FOR amp AS (SELECT age, salary FROM jsd_employee) DO
        SET total_salary = total_salary + emp.salary;
        SET total_ages = total_ages + emp.age;
    END FOR;
    
    -- Display results
    SELECT total_salary AS `Total Salary`, total_ages AS `Total Ages`;
    
END;

+------------+----------+
|Total Salary|Total Ages|
+------------+----------+
|    283000.0|       133|
+------------+----------+
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;And its PySpark &lt;span class="lia-unicode-emoji" title=":snake:"&gt;🐍&lt;/span&gt; equivalent&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession
from pyspark.sql.functions import sum as spark_sum

# Initialize Spark and create DataFrame
spark = SparkSession.builder.appName("EmployeeAnalysis").getOrCreate()

employee_df = spark.createDataFrame([
    ("John Smith", 28, 65000.00, "Engineering"),
    ("Sarah Johnson", 34, 75000.00, "Marketing"),
    ("Mike Davis", 42, 85000.00, "Finance"),
    ("Emily Brown", 29, 58000.00, "HR")
], ["name", "age", "salary", "department"])

# Calculate and display totals
employee_df.agg(
    spark_sum("salary").alias("Total Salary"),
    spark_sum("age").alias("Total Ages")
).show()

spark.stop()

+------------+----------+
|Total Salary|Total Ages|
+------------+----------+
|    283000.0|       133|
+------------+----------+
&lt;/LI-CODE&gt;
&lt;DIV class="uj uk ul um un y"&gt;
&lt;ARTICLE&gt;
&lt;DIV class="y"&gt;
&lt;DIV class="y"&gt;
&lt;SECTION&gt;
&lt;DIV&gt;
&lt;DIV class="lh uu uv uw ux"&gt;
&lt;DIV class="o q"&gt;
&lt;DIV class="eb n ec ed ee ef"&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;While PySpark code is shorter and succinct, for a Data Analyst not familiar with PySpark, it can be onerous to read. By contrast, they can use a familiar procedural language in SQL to express the same logic and the outcome.&lt;/P&gt;
&lt;H2 id="9e08" class="adb abv uz as abw mn adc mo mr ms add mt mw mx ade my nb nc adf nd ng nh adg ni nl adh co" data-selectable-paragraph=""&gt;What’s next?&lt;/H2&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;You can try this feature in Apache Spark™ 4.0.0 release. You can try it for yourself.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;I have only touched the surface of its full capabilities as a procedural language. To get an extensive and deep dive exposure, check out these sources &lt;span class="lia-unicode-emoji" title=":backhand_index_pointing_down:"&gt;👇&lt;/span&gt;:&lt;/P&gt;
&lt;OL class=""&gt;
&lt;LI id="2280" class="zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":eyes:"&gt;👀&lt;/span&gt; Watch&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://youtu.be/NjGgTdGzF8o?list=PLfx_5c0MeG0-_jhc2h1pUgcaOEva-3IIm&amp;amp;t=2803" target="_blank" rel="noopener ugc nofollow"&gt;Upcoming Apache Spark 4.0.0 Release Meetup Recording&lt;/A&gt;&lt;/LI&gt;
&lt;LI id="a1b1" class="zq zr uz zs b zt adl zv zw zx adm zz aba abb adn abd abe abf ado abh abi abj adp abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":open_book:"&gt;📖&lt;/span&gt; Read&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://www.databricks.com/blog/introducing-sql-scripting-support-databricks-part-1" target="_blank" rel="noopener ugc nofollow"&gt;Introducing SQL Scripting Support in Databricks, Part 1&lt;/A&gt;&lt;/LI&gt;
&lt;LI id="c106" class="zq zr uz zs b zt adl zv zw zx adm zz aba abb adn abd abe abf ado abh abi abj adp abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":open_book:"&gt;📖&lt;/span&gt; Read&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://www.databricks.com/blog/introducing-sql-scripting-databricks-part-2" target="_blank" rel="noopener ugc nofollow"&gt;Introducing SQL Scripting in Databricks, Part 2&lt;/A&gt;&lt;/LI&gt;
&lt;LI id="da86" class="zq zr uz zs b zt adl zv zw zx adm zz aba abb adn abd abe abf ado abh abi abj adp abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":open_book:"&gt;📖&lt;/span&gt;Read&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-scripting" target="_blank" rel="noopener ugc nofollow"&gt;SQL Scripting documentation for Databricks&lt;/A&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/SECTION&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/ARTICLE&gt;
&lt;/DIV&gt;
&lt;DIV class="o q"&gt;
&lt;DIV class="eb n ec ed ee ef"&gt;
&lt;DIV class="adq hb o gw"&gt;
&lt;DIV class="fb o"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="fb o"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 31 Oct 2025 22:04:53 GMT</pubDate>
    <dc:creator>jsdmatrix</dc:creator>
    <dc:date>2025-10-31T22:04:53Z</dc:date>
    <item>
      <title>SQL Scripting in Apache Spark™ 4.0</title>
      <link>https://community.databricks.com/t5/community-articles/sql-scripting-in-apache-spark-4-0/m-p/137151#M759</link>
      <description>&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;The Apache Spark™ 4.0 introduces a new feature for SQL developers and data engineers: SQL Scripting. As such, this feature enhances the power and extends the flexibility of Spark SQL, enabling users to write procedural code within SQL queries, with the added support of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://en.wikipedia.org/wiki/SQL/PSM" target="_blank" rel="noopener ugc nofollow"&gt;ANSI SQL/PSM&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;scripting language in Spark &lt;span class="lia-unicode-emoji" title=":collision:"&gt;💥&lt;/span&gt;.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;Yes, you can now have control flow and conditional statements, allowing you to execute complex logic, loops, and conditionals directly in your SQL scripts. Whether your workloads involve data pipelines, transformations, or explorations of large datasets, Apache Spark™ 4.0 SQL Scripting extends the power of SQL to manipulate data with ease through procedural programming.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2025-10-31 at 2.43.13 PM.png" style="width: 675px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21253i4FBDC25B15475902/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2025-10-31 at 2.43.13 PM.png" alt="Screenshot 2025-10-31 at 2.43.13 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;&lt;SPAN&gt;There are three primary reasons why this feature will help SQL developers who want to harness the power of Spark.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3 id="6f73" class="abu abv uz as abw abx aby abz mr aca acb acc mw abb acd ace acf abf acg ach aci abj acj ack acl acm co" data-selectable-paragraph=""&gt;1. Familiarity and Portability&lt;/H3&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;ANSI SQL/PSM is a standard and common way to write step-by-step logic in SQL, and many SQL developers already utilize it in databases such as PostgreSQL, Oracle, or SQL Server. By adding support for this in Spark, it becomes significantly easier to migrate existing SQL scripts into Spark without substantial changes. This means that SQL developers and data analysts can fully leverage Spark’s speed and power when working with data in the Lakehouse.&lt;/P&gt;
&lt;H3 id="9168" class="abu abv uz as abw abx aby abz mr aca acb acc mw abb acd ace acf abf acg ach aci abj acj ack acl acm co" data-selectable-paragraph=""&gt;2. Better Control Flow in SQL Pipelines&lt;/H3&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;Standard SQL in Spark is great for simple queries, but it struggles with things like variables, loops, conditions, or more complex logic. The ANSI SQL/PSM scripting adds language constructs like DECLARE, SET, IF-THEN-ELSE, WHILE…END, FOR …DO , REPEAT, CASE, and BEGIN…END, so you can write advanced logic directly in SQL without needing to switch to Scala, Python, or other tools.&lt;/P&gt;
&lt;H3 id="7a64" class="abu abv uz as abw abx aby abz mr aca acb acc mw abb acd ace acf abf acg ach aci abj acj ack acl acm co" data-selectable-paragraph=""&gt;3. Easy Integration&lt;/H3&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;Because ANSI SQL/PSM works seamlessly with Spark SQL, it enables complex logic without switching to another language like Python or Scala. Before, you had to use PySpark to write procedural logic and Spark SQL for queries. With this feature, data analysts can easily use their familiar language to do both, in SQL scripting.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;Let’s examine a simple example code in ANSI SQL/PSM script with an equivalent code in PySpark script. Assume we have a newly created target table&lt;STRONG class="zs hq"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;t.&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# Create an empty DataFrame with the same schema as the target table
# t
df = spark.createDataFrame([], schema="c INT")

# Loop to insert values into the DataFrame
c = 10
while c &amp;gt; 0:
    df = df.union(spark.createDataFrame([(c,)], schema="c INT"))
    c -= 1

# Insert the DataFrame into the target table
df.write.insertInto("t")

# Display the contents of the table
(spark.sql("SELECT * FROM t")).show()

+--+
|c |
+--+
|10|
|9 |
|8 |
|7 |
|6 |
|5 |
+--+&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;And its equivalent ANSI SQL/PSM script:&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="php"&gt;BEGIN
  DECLARE c INT = 10;
  WHILE c &amp;gt; 0 DO
    INSERT INTO t VALUES (c);
    SET c = c - 1;
  END WHILE;
  SELECT * FROM T;
END

+--+
|c |
+--+
|10|
|9 |
|8 |
|7 |
|6 |
|5 |
|4 |
|3 |
|2 |
|1 |
+--+&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;Here is another example of a FOR … LOOP construct in ANS SQL/PSM script.&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;-- Short ANSI SQL/PSM Script with FOR loop calculations
BEGIN
    -- Declare variables
    DECLARE total_salary DECIMAL(12,2) DEFAULT 0;
    DECLARE total_ages INTEGER DEFAULT 0;
    
    -- Create and populate table
    CREATE TABLE jsd_employee (
        name VARCHAR(100),
        age INTEGER,
        salary DECIMAL(10,2),
        department VARCHAR(50)
    );
    
    INSERT INTO jsd_employee VALUES
        ('John Smith', 28, 65000.00, 'Engineering'),
        ('Sarah Johnson', 34, 75000.00, 'Marketing'),
        ('Mike Davis', 42, 85000.00, 'Finance'),
        ('Emily Brown', 29, 58000.00, 'HR');
    
    -- FOR loop to calculate totals
    FOR amp AS (SELECT age, salary FROM jsd_employee) DO
        SET total_salary = total_salary + emp.salary;
        SET total_ages = total_ages + emp.age;
    END FOR;
    
    -- Display results
    SELECT total_salary AS `Total Salary`, total_ages AS `Total Ages`;
    
END;

+------------+----------+
|Total Salary|Total Ages|
+------------+----------+
|    283000.0|       133|
+------------+----------+
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;And its PySpark &lt;span class="lia-unicode-emoji" title=":snake:"&gt;🐍&lt;/span&gt; equivalent&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession
from pyspark.sql.functions import sum as spark_sum

# Initialize Spark and create DataFrame
spark = SparkSession.builder.appName("EmployeeAnalysis").getOrCreate()

employee_df = spark.createDataFrame([
    ("John Smith", 28, 65000.00, "Engineering"),
    ("Sarah Johnson", 34, 75000.00, "Marketing"),
    ("Mike Davis", 42, 85000.00, "Finance"),
    ("Emily Brown", 29, 58000.00, "HR")
], ["name", "age", "salary", "department"])

# Calculate and display totals
employee_df.agg(
    spark_sum("salary").alias("Total Salary"),
    spark_sum("age").alias("Total Ages")
).show()

spark.stop()

+------------+----------+
|Total Salary|Total Ages|
+------------+----------+
|    283000.0|       133|
+------------+----------+
&lt;/LI-CODE&gt;
&lt;DIV class="uj uk ul um un y"&gt;
&lt;ARTICLE&gt;
&lt;DIV class="y"&gt;
&lt;DIV class="y"&gt;
&lt;SECTION&gt;
&lt;DIV&gt;
&lt;DIV class="lh uu uv uw ux"&gt;
&lt;DIV class="o q"&gt;
&lt;DIV class="eb n ec ed ee ef"&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;While PySpark code is shorter and succinct, for a Data Analyst not familiar with PySpark, it can be onerous to read. By contrast, they can use a familiar procedural language in SQL to express the same logic and the outcome.&lt;/P&gt;
&lt;H2 id="9e08" class="adb abv uz as abw mn adc mo mr ms add mt mw mx ade my nb nc adf nd ng nh adg ni nl adh co" data-selectable-paragraph=""&gt;What’s next?&lt;/H2&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt acn zv zw zx aco zz aba abb acp abd abe abf acq abh abi abj acr abl abm abn lh co" data-selectable-paragraph=""&gt;You can try this feature in Apache Spark™ 4.0.0 release. You can try it for yourself.&lt;/P&gt;
&lt;P class="pw-post-body-paragraph zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn lh co" data-selectable-paragraph=""&gt;I have only touched the surface of its full capabilities as a procedural language. To get an extensive and deep dive exposure, check out these sources &lt;span class="lia-unicode-emoji" title=":backhand_index_pointing_down:"&gt;👇&lt;/span&gt;:&lt;/P&gt;
&lt;OL class=""&gt;
&lt;LI id="2280" class="zq zr uz zs b zt zu zv zw zx zy zz aba abb abc abd abe abf abg abh abi abj abk abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":eyes:"&gt;👀&lt;/span&gt; Watch&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://youtu.be/NjGgTdGzF8o?list=PLfx_5c0MeG0-_jhc2h1pUgcaOEva-3IIm&amp;amp;t=2803" target="_blank" rel="noopener ugc nofollow"&gt;Upcoming Apache Spark 4.0.0 Release Meetup Recording&lt;/A&gt;&lt;/LI&gt;
&lt;LI id="a1b1" class="zq zr uz zs b zt adl zv zw zx adm zz aba abb adn abd abe abf ado abh abi abj adp abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":open_book:"&gt;📖&lt;/span&gt; Read&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://www.databricks.com/blog/introducing-sql-scripting-support-databricks-part-1" target="_blank" rel="noopener ugc nofollow"&gt;Introducing SQL Scripting Support in Databricks, Part 1&lt;/A&gt;&lt;/LI&gt;
&lt;LI id="c106" class="zq zr uz zs b zt adl zv zw zx adm zz aba abb adn abd abe abf ado abh abi abj adp abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":open_book:"&gt;📖&lt;/span&gt; Read&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://www.databricks.com/blog/introducing-sql-scripting-databricks-part-2" target="_blank" rel="noopener ugc nofollow"&gt;Introducing SQL Scripting in Databricks, Part 2&lt;/A&gt;&lt;/LI&gt;
&lt;LI id="da86" class="zq zr uz zs b zt adl zv zw zx adm zz aba abb adn abd abe abf ado abh abi abj adp abl abm abn adi adj adk co" data-selectable-paragraph=""&gt;&lt;span class="lia-unicode-emoji" title=":open_book:"&gt;📖&lt;/span&gt;Read&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="bj fk" href="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-scripting" target="_blank" rel="noopener ugc nofollow"&gt;SQL Scripting documentation for Databricks&lt;/A&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/SECTION&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/ARTICLE&gt;
&lt;/DIV&gt;
&lt;DIV class="o q"&gt;
&lt;DIV class="eb n ec ed ee ef"&gt;
&lt;DIV class="adq hb o gw"&gt;
&lt;DIV class="fb o"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="fb o"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Oct 2025 22:04:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/sql-scripting-in-apache-spark-4-0/m-p/137151#M759</guid>
      <dc:creator>jsdmatrix</dc:creator>
      <dc:date>2025-10-31T22:04:53Z</dc:date>
    </item>
  </channel>
</rss>

