cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Declarative pipeline full table refresh generates empty MV.

cdn_yyz_yul
Contributor II

Hi everyone,

- the situation:
I have a Declarative pipeline. The pipeline consists several .py files.
file1.py: creates a Materialized  View: description.
file2.py: create 2nd Materialized View by reading a table "transactions" and reading the MV "description", then joining them together. Let's call it trans_with_description.
file2.py: reads trasn_with_description and does more processing. As trans_with_description has array columns, col_a, col_b. I call a function to flatten the array columns. The output is final_result.
---------

- the problem:
when I run the DLT pipeline with "full table refresh", the final_result has 0 rows. If I run the pipeline again (not full refresh), all expected rows are there, and flattened. 

If I remove the flatten array operation, run the pipeline. the final_result has expected rows with array column as array. 

What could be the cause of the problem? 

I also noticed that without the flatten array step, the "pipeline graph" shows every element is wired correctly.
whenever I add the faltten array step, the final_result is detected from the rest "pipeline". 

--
Compute: serverless 
4 - Python 3.12, Scala 2.13, Java 17

0 REPLIES 0