- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
We have complete sql based dlt pipelines. Where bronze tables are read from UC volumes. There can be situations when no new data comes so of the endpoints in the UC volume. in that case the SQL code blocks gets failed which results in failing the entire pipelines. Where ideal case should be of whichever endpoints in the UC volume have new data those SQL blocks should run correctly which those where no new file got added. They should haves some if else or try catch block blocks to handle this scenario. How to achieve it
This is how right now for example two code blocks of my SQL based dlt pipeline looks
code block 1
CREATE OR REFRESH MATERIALIZED VIEW jamf_inventory_information
COMMENT "Incremental upload of data incoming from source endpoint 1"
AS
SELECT *
FROM read_files(
'/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint/*/*.json'
);
code block 2
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from Jamf for endpoint 2"
AS
SELECT *
FROM read_files(
'/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpint_2/*/*.json'
);
code block 1 with endpoint_1 got new file or it has the files then it runs correctly.
while code block 2 with endpoint_2 has no file or no folder. So idealy that code block should be skipped but if it runs it will fail and raise error.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hello, thank you for your question.
Since SQL-based Delta Live Tables (DLT) pipelines do not natively support IF-ELSE or TRY-CATCH constructs, you'll need an approach to gracefully handle missing files. Here are two recommended solutions:
Solution 1: Use read_files
with allow_missing_files = true
The read_files
function supports an allow_missing_files
option that prevents failure if the path is empty or missing. Modify your queries as follows:
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from endpoint_2"
AS
SELECT *
FROM read_files(
'/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint_2/*/*.json',
allow_missing_files => true -- Prevents failure when no files exist
);
This ensures that if no new files exist, the query does not fail and instead returns an empty dataset.
Solution 2: Use WHERE
with a Metadata Table (Advanced)
- Maintain a metadata table tracking available file paths.
- Before querying, filter out missing sources dynamically.
WITH available_sources AS (
SELECT path FROM file_metadata_table WHERE table_name = 'endpoint_2'
)
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices AS
SELECT * FROM read_files(
(SELECT path FROM available_sources)
);
This prevents querying non-existent sources.
If allow_missing_files => true
works in your case, it is the easiest solution. Otherwise, dynamically maintaining a metadata table for available file paths is a more robust alternative. Let me know if you need more details!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hello, thank you for your question.
Since SQL-based Delta Live Tables (DLT) pipelines do not natively support IF-ELSE or TRY-CATCH constructs, you'll need an approach to gracefully handle missing files. Here are two recommended solutions:
Solution 1: Use read_files
with allow_missing_files = true
The read_files
function supports an allow_missing_files
option that prevents failure if the path is empty or missing. Modify your queries as follows:
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices
COMMENT "Incremental upload of data incoming from endpoint_2"
AS
SELECT *
FROM read_files(
'/Volumes/catalog/${schema_path}/catalog_name-${schema_path}/source=source/*/table_name=endpoint_2/*/*.json',
allow_missing_files => true -- Prevents failure when no files exist
);
This ensures that if no new files exist, the query does not fail and instead returns an empty dataset.
Solution 2: Use WHERE
with a Metadata Table (Advanced)
- Maintain a metadata table tracking available file paths.
- Before querying, filter out missing sources dynamically.
WITH available_sources AS (
SELECT path FROM file_metadata_table WHERE table_name = 'endpoint_2'
)
CREATE OR REFRESH MATERIALIZED VIEW jamf_mobile_devices AS
SELECT * FROM read_files(
(SELECT path FROM available_sources)
);
This prevents querying non-existent sources.
If allow_missing_files => true
works in your case, it is the easiest solution. Otherwise, dynamically maintaining a metadata table for available file paths is a more robust alternative. Let me know if you need more details!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Is there a reason why you can't use Autoloader for this? That would only trigger the pipeline when new files arrive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Yes, using autoloader with file notification mode can be useful here.
Also you can use the IF EXISTS
clause to check if the files exist before attempting to create or refresh the materialized view. This will prevent the SQL block from running if there are no new files.
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)