cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Adding if statements or try/catch block in sql based dlt pipelines

ashraf1395
Valued Contributor II

We have complete sql based dlt pipelines. Where bronze tables are read from UC volumes. There can be situations when no new data comes so of the endpoints in the UC volume. in that case the SQL code blocks gets failed which results in failing the entire pipelines. Where ideal case should be of whichever endpoints in the UC volume have new data those SQL blocks should run correctly which those where no new file got added. They should haves some if else or try catch block blocks to handle this scenario. How to achieve it

 

This is how right now  for example two code blocks of my SQL based dlt pipeline looks

code block 1

CREATE OR REFRESH MATERIALIZED VIEW jamf_inventory_information
COMMENT "Incremental upload of data incoming from source endpoint 1"
AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint/*/*.json'
);

code block 2

CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from Jamf for endpoint 2"

AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpint_2/*/*.json'
);

code block 1 with endpoint_1 got new file or it has the files then it runs correctly.

while code block 2 with endpoint_2 has no file or no folder. So idealy that code block should be skipped but if it runs it will fail and raise error.

1 ACCEPTED SOLUTION

Accepted Solutions

VZLA
Databricks Employee
Databricks Employee

Hello, thank you for your question.

Since SQL-based Delta Live Tables (DLT) pipelines do not natively support IF-ELSE or TRY-CATCH constructs, you'll need an approach to gracefully handle missing files. Here are two recommended solutions:

Solution 1: Use read_files with allow_missing_files = true

The read_files function supports an allow_missing_files option that prevents failure if the path is empty or missing. Modify your queries as follows:

CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from endpoint_2"
AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint_2/*/*.json',
  allow_missing_files => true  -- Prevents failure when no files exist
);

This ensures that if no new files exist, the query does not fail and instead returns an empty dataset.

Solution 2: Use WHERE with a Metadata Table (Advanced)

  • Maintain a metadata table tracking available file paths.
  • Before querying, filter out missing sources dynamically.
WITH available_sources AS (
  SELECT path FROM file_metadata_table WHERE table_name = 'endpoint_2'
)
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices AS
SELECT * FROM read_files(
  (SELECT path FROM available_sources)
);

This prevents querying non-existent sources.

If allow_missing_files => true works in your case, it is the easiest solution. Otherwise, dynamically maintaining a metadata table for available file paths is a more robust alternative. Let me know if you need more details!

View solution in original post

3 REPLIES 3

VZLA
Databricks Employee
Databricks Employee

Hello, thank you for your question.

Since SQL-based Delta Live Tables (DLT) pipelines do not natively support IF-ELSE or TRY-CATCH constructs, you'll need an approach to gracefully handle missing files. Here are two recommended solutions:

Solution 1: Use read_files with allow_missing_files = true

The read_files function supports an allow_missing_files option that prevents failure if the path is empty or missing. Modify your queries as follows:

CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from endpoint_2"
AS
SELECT *
FROM read_files(
  '/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint_2/*/*.json',
  allow_missing_files => true  -- Prevents failure when no files exist
);

This ensures that if no new files exist, the query does not fail and instead returns an empty dataset.

Solution 2: Use WHERE with a Metadata Table (Advanced)

  • Maintain a metadata table tracking available file paths.
  • Before querying, filter out missing sources dynamically.
WITH available_sources AS (
  SELECT path FROM file_metadata_table WHERE table_name = 'endpoint_2'
)
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices AS
SELECT * FROM read_files(
  (SELECT path FROM available_sources)
);

This prevents querying non-existent sources.

If allow_missing_files => true works in your case, it is the easiest solution. Otherwise, dynamically maintaining a metadata table for available file paths is a more robust alternative. Let me know if you need more details!

Rjdudley
Valued Contributor II

Is there a reason why you can't use Autoloader for this?  That would only trigger the pipeline when new files arrive.

Sidhant07
Databricks Employee
Databricks Employee

Yes, using autoloader with file notification mode can be useful here.

Also you can use the IF EXISTS clause to check if the files exist before attempting to create or refresh the materialized view. This will prevent the SQL block from running if there are no new files.

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group