Hi,
I'm implementing a DLT pipeline using Auto Loader to ingest json files. The json files contains an array called Items that contains records and two of the fields in the records wasn't part of the original schema, but has been added later. Auto Loader has updated the schema and when I query the bronze table in Databricks SQL, I can select the two added fields, and they are NULL in the data that was loaded before they existed in the json files and contains values in more recent data.
However, in the notebook that creates the silver table, I get an error when running the DLT pipeline with a message saying "Unable to process statement for View 'VehicleInspectionProtocol_Flattened_Temp'. [FIELD_NOT_FOUND] No such struct field `RegistrationDate` in ...". RegistrationDate was one of the fields not part of the records initially. Can I implement the notebook in a better way?
CREATE TEMPORARY STREAMING LIVE VIEW VehicleInspectionProtocol_Explode_Temp
AS
SELECT
explode(Items) as Item,
AuditSourceFile,
AuditLoadTime
FROM STREAM(LIVE.vehicleInspectionProtocol_Raw);
CREATE TEMPORARY STREAMING LIVE VIEW VehicleInspectionProtocol_Flattened_Temp
AS
SELECT
Item.Id,
Item.RegistrationDate,
Item.InspectionDateTime,
Item.VehicleInspectionType,
AuditSourceFile,
AuditLoadTime
FROM STREAM(LIVE.VehicleInspectionProtocol_Explode_Temp);
CREATE OR REFRESH STREAMING LIVE TABLE VehicleInspectionProtocol;
APPLY CHANGES INTO
LIVE.VehicleInspectionProtocol
FROM
STREAM(LIVE.VehicleInspectionProtocol_Flattened_Temp)
KEYS
(Id)
SEQUENCE BY
AuditSourceFile;
Thanks,
Magnus