stucas
New Contributor II

Thank you for the reply - I have tried this (it was suggested in earlier solutions); but that may well be a side effect of the above function.

query = f"""
            SELECT pivot_key,
                {select_clause}
            FROM
                data_to_pivot
            GROUP BY
                pivot_key
            """

However on Pipeline initialisation it failed with an invalid SQL error as the {select_clause} was empty. I believe this is the root cause as there is no schema defined at this point in the process; so DLT just assumes an empty string.

When the autoMerge was added - the job worked, but no columns from the select statement were added.

For a beginner this is all very strange; but I assume linked to the way DLT relies on Sparks lazy loading (hence certain functions that require full data loading are prohibited e..g collect(), pivot())?