Databricks Community

ashraf1395 · ‎01-29-2025

We have complete sql based dlt pipelines. Where bronze tables are read from UC volumes. There can be situations when no new data comes so of the endpoints in the UC volume. in that case the SQL code blocks gets failed which results in failing the entire pipelines. Where ideal case should be of whichever endpoints in the UC volume have new data those SQL blocks should run correctly which those where no new file got added. They should haves some if else or try catch block blocks to handle this scenario. How to achieve it

This is how right now for example two code blocks of my SQL based dlt pipeline looks

code block 1

CREATE OR REFRESH MATERIALIZED VIEW jamf_inventory_information
COMMENT "Incremental upload of data incoming from source endpoint 1"
AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint/*/*.json'
);

code block 2

CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from Jamf for endpoint 2"

AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpint_2/*/*.json'
);

code block 1 with endpoint_1 got new file or it has the files then it runs correctly.

while code block 2 with endpoint_2 has no file or no folder. So idealy that code block should be skipped but if it runs it will fail and raise error.

VZLA · ‎01-29-2025

Hello, thank you for your question.

Since SQL-based Delta Live Tables (DLT) pipelines do not natively support IF-ELSE or TRY-CATCH constructs, you'll need an approach to gracefully handle missing files. Here are two recommended solutions:

Solution 1: Use `read_files` with `allow_missing_files = true`

The read_files function supports an allow_missing_files option that prevents failure if the path is empty or missing. Modify your queries as follows:

CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from endpoint_2"
AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint_2/*/*.json',
  allow_missing_files => true  -- Prevents failure when no files exist
);

This ensures that if no new files exist, the query does not fail and instead returns an empty dataset.

Solution 2: Use `WHERE` with a Metadata Table (Advanced)

Maintain a metadata table tracking available file paths.
Before querying, filter out missing sources dynamically.

WITH available_sources AS (
  SELECT path FROM file_metadata_table WHERE table_name = 'endpoint_2'
)
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices AS
SELECT * FROM read_files(
  (SELECT path FROM available_sources)
);

This prevents querying non-existent sources.

If allow_missing_files => true works in your case, it is the easiest solution. Otherwise, dynamically maintaining a metadata table for available file paths is a more robust alternative. Let me know if you need more details!

View solution in original post

VZLA · ‎01-29-2025

Hello, thank you for your question.

Since SQL-based Delta Live Tables (DLT) pipelines do not natively support IF-ELSE or TRY-CATCH constructs, you'll need an approach to gracefully handle missing files. Here are two recommended solutions:

Solution 1: Use `read_files` with `allow_missing_files = true`

The read_files function supports an allow_missing_files option that prevents failure if the path is empty or missing. Modify your queries as follows:

CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from endpoint_2"
AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint_2/*/*.json',
  allow_missing_files => true  -- Prevents failure when no files exist
);

This ensures that if no new files exist, the query does not fail and instead returns an empty dataset.

Solution 2: Use `WHERE` with a Metadata Table (Advanced)

Maintain a metadata table tracking available file paths.
Before querying, filter out missing sources dynamically.

WITH available_sources AS (
  SELECT path FROM file_metadata_table WHERE table_name = 'endpoint_2'
)
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices AS
SELECT * FROM read_files(
  (SELECT path FROM available_sources)
);

This prevents querying non-existent sources.

If allow_missing_files => true works in your case, it is the easiest solution. Otherwise, dynamically maintaining a metadata table for available file paths is a more robust alternative. Let me know if you need more details!

Rjdudley · ‎01-29-2025

Is there a reason why you can't use Autoloader for this? That would only trigger the pipeline when new files arrive.

Sidhant07 · ‎01-29-2025

Yes, using autoloader with file notification mode can be useful here.

Also you can use the IF EXISTS clause to check if the files exist before attempting to create or refresh the materialized view. This will prevent the SQL block from running if there are no new files.

Databricks Community

Adding if statements or try/catch block in sql based dlt pipelines

Solution 1: Use `read_files` with `allow_missing_files = true`

Solution 2: Use `WHERE` with a Metadata Table (Advanced)

Solution 1: Use `read_files` with `allow_missing_files = true`

Solution 2: Use `WHERE` with a Metadata Table (Advanced)

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!

Databricks Community

Adding if statements or try/catch block in sql based dlt pipelines

Solution 1: Use read_files with allow_missing_files = true

Solution 2: Use WHERE with a Metadata Table (Advanced)

Solution 1: Use read_files with allow_missing_files = true

Solution 2: Use WHERE with a Metadata Table (Advanced)

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!

Solution 1: Use `read_files` with `allow_missing_files = true`

Solution 2: Use `WHERE` with a Metadata Table (Advanced)

Solution 1: Use `read_files` with `allow_missing_files = true`

Solution 2: Use `WHERE` with a Metadata Table (Advanced)