There are few options you can try. Based of the information shared, I am assuming a skeleton for you complicated query as follows:
WITH cte_one AS (
SELECT *
FROM view_one
),
-- Other CTEs
...
-- Your main query logic
SELECT
FROM cte_one
-- Joins and other clauses
In general if you are having missing values for a particular column and want to default it to a NULL value you can try using the COALESCE method. If you are already using that and want a custom Databricks functionality/approach you can try reading about the materialized views, streaming tables, generic views, and handling expectations on tables that Delta Live Table (DLT) framework offers in Databricks.
A change of approach can be considered here as well, if you are flexible enough to use DataFrame API in Spark instead of pure SQL. This means you can check for the columns' existence first and remediate using an alternate query prior to running the large query.
A final advise from me is for you to try a more controlled approach as follows:
- STEP 1: Identify all required columns
- STEP 2: Create views with default columns
- STEP 3: Use the views in your query (either using the same SQL or refactoring the code to utilize a managed framework like DLT within Databricks)
If you have any additional information that might help us help you, please feel free to comment.
LK