APPLY_CHANGES with json data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2023 11:50 PM - edited 08-30-2023 11:53 PM
Hi Team,
I am building a DLT pipeline and planning to use APPLY_CHANGES from Bronze to Silver. In the bronze table, a column has a json value. This value contains questions and answers as key, value pair and can change depending on list of questions have been asked.
Question 1: Wondering if APPLY_CHANGES can detect what have been changed on this column?
Question 2: I know doing a pivot with APPLY_CHANGES is not supported, correct me if I wrong. Is there a way in DLT to transpose/pivot the columns using the column that has a json key, value pair?
Example:
Cheers,
G
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2023 12:14 PM
Thanks for the answers!
As an additional question, can APPLY_CHANGES handle schema evolution from its source bronze table?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2023 06:45 PM
So I tried to implement a pivot inside @dlt.view.
@dlt.view(
name="silver_dynquestion_vw"
)
def silver_dynquestion_vw():
df = dlt.read_stream("bronze_events_py")
df = (
df
.select(
explode("dynamicQuestions.answers").alias("answers")
, "answers.question.key"
, "answers.answer.value"
, "eventCreated"
)
)
df = (
df
.groupBy(
"eventCreated"
)
.pivot("key")
.agg(max("value"))
.select("*")
)
return dfBut it is giving me this error.
pyspark.errors.exceptions.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
What is a better way to do this?
Thanks again!