cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How execute SET spark.sql.sources.partitionOverwriteMode = dynamic; in SQL Stored procedures

RGSLCA
New Contributor

Hi,

I am able to execute the INSERT OVERWRITE TABLE <tables> PARTITION command , in a notebook cell

 

 
SET spark.sql.sources.partitionOverwriteMode=dynamic;

DECLARE OR REPLACE VARIABLE v_load_date DATE;

SET VAR v_load_date = DATE '2026-05-03';

INSERT OVERWRITE TABLE sil_test_rg
PARTITION (bed)
SELECT *
FROM brz_test_rg
WHERE bed IN (
    SELECT DISTINCT bed
    FROM brz_test_rg
    WHERE load_date = v_load_date
)
and load_date = v_load_date;

Once I converted the code to a stored procedure, I dont have the REPLACE functionality, 

I tried wrapping the command "SET spark.sql.sources.partitionOverwriteMode = dynamic" in a execute immediate statement, but it doesn't help.

 

 
BEGIN
    DECLARE slv_ingest_dt DATE;

    SET slv_ingest_dt = cast('2026-05-03' as date);

    execute immediate 'SET spark.sql.sources.partitionOverwriteMode=dynamic';

    CALL ingest_silver_table('sil_test_rg','brz_test_rg',slv_ingest_dt);
END;
 
How do I use the SELECTIVE OVERWRITE option in stored procedures ?
 
1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @RGSLCA , 

The short answer is that you can't make SET spark.sql.sources.partitionOverwriteMode=dynamic work from a stored procedure running on a SQL warehouse or serverless. That dynamic partition overwrite path is legacy, and it's SQL-supported on classic compute only, so the engine just ignores it everywhere else. EXECUTE IMMEDIATE won't rescue you either, because this is a compute support issue, not a string execution one. The reason it worked in your notebook is that the notebook was attached to a classic all-purpose cluster.

The supported, compute-independent replacement is INSERT ... REPLACE USING. Rewrite your overwrite this way, with no SET and no EXECUTE IMMEDIATE:

 
sql
INSERT INTO sil_test_rg
REPLACE USING (bed)
SELECT *
FROM brz_test_rg
WHERE load_date = slv_ingest_dt;

That replaces every existing row in sil_test_rg whose bed value shows up in the day's data and leaves all other bed partitions untouched, which is exactly what your dynamic partition overwrite was doing.

The change belongs inside ingest_silver_table, since that's where the actual INSERT OVERWRITE lives. Because that procedure receives the table names as parameters, wrap them in IDENTIFIER():

 
 
sql
INSERT INTO IDENTIFIER(p_target_table)
REPLACE USING (bed)
SELECT *
FROM IDENTIFIER(p_source_table)
WHERE load_date = p_ingest_dt;

Your outer procedure then drops the SET entirely:

 
sql
BEGIN
  DECLARE slv_ingest_dt DATE;
  SET slv_ingest_dt = DATE '2026-05-03';
  CALL ingest_silver_table('sil_test_rg', 'brz_test_rg', slv_ingest_dt);
END;

A few version and behavior notes, because they matter for REPLACE USING:

  1. REPLACE USING for SQL needs Databricks Runtime 16.3 or later (Python and Scala need 18.2 or later).
  2. On DBR 16.3 through 17.1 you get the legacy behavior: only dynamic partition overwrites, and you have to list the full set of the table's partition columns in USING. Since bed is your partition column, REPLACE USING (bed) satisfies that and works on 16.3 and up.
  3. On DBR 17.2 and later you get the broader dynamic data overwrite. USING can name any columns, and partitioned, unpartitioned, and liquid-clustered tables are all supported.
  4. Empty source is safe. REPLACE USING doesn't delete any rows when the SELECT returns nothing, so it won't blank out a partition on a zero-row day. That's the same net effect as your dynamic partition overwrite.
  5. One safety note: the docs ask you to validate that the written data only touches the expected bed values. A stray row in the wrong partition will overwrite that whole partition.

If REPLACE USING doesn't fit your case, you have a few alternatives.

  1. INSERT INTO ... REPLACE WHERE <predicate> works on SQL from DBR 12.2 LTS and up, on all compute. It's a good fit when you have a fixed predicate like load_date = '2026-05-03'. One caveat: with an empty source, REPLACE WHERE can delete rows.
  2. REPLACE ON (<NULL-safe condition>) works on SQL from DBR 17.1 and up, but only reach for it if you need NULL-safe or more complex match logic.
  3. If you truly have to keep INSERT OVERWRITE ... PARTITION with partitionOverwriteMode=dynamic, run the procedure on classic compute rather than a SQL warehouse or serverless. Databricks labels this legacy and not recommended for new workloads, so I'd treat it as a last resort.

Regards, Louis.

Sources used in my research:

  1. Selectively overwrite data with Delta Lake, Databricks on AWS: https://docs.databricks.com/aws/en/delta/selective-overwrite
  2. Selectively overwrite data with Delta Lake, Azure Databricks (Microsoft Learn): https://learn.microsoft.com/en-us/azure/databricks/delta/selective-overwrite
  3. INSERT, Databricks SQL language reference: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into

View solution in original post

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @RGSLCA , 

The short answer is that you can't make SET spark.sql.sources.partitionOverwriteMode=dynamic work from a stored procedure running on a SQL warehouse or serverless. That dynamic partition overwrite path is legacy, and it's SQL-supported on classic compute only, so the engine just ignores it everywhere else. EXECUTE IMMEDIATE won't rescue you either, because this is a compute support issue, not a string execution one. The reason it worked in your notebook is that the notebook was attached to a classic all-purpose cluster.

The supported, compute-independent replacement is INSERT ... REPLACE USING. Rewrite your overwrite this way, with no SET and no EXECUTE IMMEDIATE:

 
sql
INSERT INTO sil_test_rg
REPLACE USING (bed)
SELECT *
FROM brz_test_rg
WHERE load_date = slv_ingest_dt;

That replaces every existing row in sil_test_rg whose bed value shows up in the day's data and leaves all other bed partitions untouched, which is exactly what your dynamic partition overwrite was doing.

The change belongs inside ingest_silver_table, since that's where the actual INSERT OVERWRITE lives. Because that procedure receives the table names as parameters, wrap them in IDENTIFIER():

 
 
sql
INSERT INTO IDENTIFIER(p_target_table)
REPLACE USING (bed)
SELECT *
FROM IDENTIFIER(p_source_table)
WHERE load_date = p_ingest_dt;

Your outer procedure then drops the SET entirely:

 
sql
BEGIN
  DECLARE slv_ingest_dt DATE;
  SET slv_ingest_dt = DATE '2026-05-03';
  CALL ingest_silver_table('sil_test_rg', 'brz_test_rg', slv_ingest_dt);
END;

A few version and behavior notes, because they matter for REPLACE USING:

  1. REPLACE USING for SQL needs Databricks Runtime 16.3 or later (Python and Scala need 18.2 or later).
  2. On DBR 16.3 through 17.1 you get the legacy behavior: only dynamic partition overwrites, and you have to list the full set of the table's partition columns in USING. Since bed is your partition column, REPLACE USING (bed) satisfies that and works on 16.3 and up.
  3. On DBR 17.2 and later you get the broader dynamic data overwrite. USING can name any columns, and partitioned, unpartitioned, and liquid-clustered tables are all supported.
  4. Empty source is safe. REPLACE USING doesn't delete any rows when the SELECT returns nothing, so it won't blank out a partition on a zero-row day. That's the same net effect as your dynamic partition overwrite.
  5. One safety note: the docs ask you to validate that the written data only touches the expected bed values. A stray row in the wrong partition will overwrite that whole partition.

If REPLACE USING doesn't fit your case, you have a few alternatives.

  1. INSERT INTO ... REPLACE WHERE <predicate> works on SQL from DBR 12.2 LTS and up, on all compute. It's a good fit when you have a fixed predicate like load_date = '2026-05-03'. One caveat: with an empty source, REPLACE WHERE can delete rows.
  2. REPLACE ON (<NULL-safe condition>) works on SQL from DBR 17.1 and up, but only reach for it if you need NULL-safe or more complex match logic.
  3. If you truly have to keep INSERT OVERWRITE ... PARTITION with partitionOverwriteMode=dynamic, run the procedure on classic compute rather than a SQL warehouse or serverless. Databricks labels this legacy and not recommended for new workloads, so I'd treat it as a last resort.

Regards, Louis.

Sources used in my research:

  1. Selectively overwrite data with Delta Lake, Databricks on AWS: https://docs.databricks.com/aws/en/delta/selective-overwrite
  2. Selectively overwrite data with Delta Lake, Azure Databricks (Microsoft Learn): https://learn.microsoft.com/en-us/azure/databricks/delta/selective-overwrite
  3. INSERT, Databricks SQL language reference: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into