Hi everyone,
- the situation:
I have a Declarative pipeline. The pipeline consists several .py files.
file1.py: creates a Materialized View: description.
file2.py: create 2nd Materialized View by reading a table "transactions" and reading the MV "description", then joining them together. Let's call it trans_with_description.
file2.py: reads trasn_with_description and does more processing. As trans_with_description has array columns, col_a, col_b. I call a function to flatten the array columns. The output is final_result.
---------
- the problem:
when I run the DLT pipeline with "full table refresh", the final_result has 0 rows. If I run the pipeline again (not full refresh), all expected rows are there, and flattened.
If I remove the flatten array operation, run the pipeline. the final_result has expected rows with array column as array.
What could be the cause of the problem?
I also noticed that without the flatten array step, the "pipeline graph" shows every element is wired correctly.
whenever I add the faltten array step, the final_result is detected from the rest "pipeline".
--
Compute: serverless
4 - Python 3.12, Scala 2.13, Java 17