Databricks Community

Magnus · ‎11-01-2023

Hi,

I'm implementing a DLT pipeline using Auto Loader to ingest json files. The json files contains an array called Items that contains records and two of the fields in the records wasn't part of the original schema, but has been added later. Auto Loader has updated the schema and when I query the bronze table in Databricks SQL, I can select the two added fields, and they are NULL in the data that was loaded before they existed in the json files and contains values in more recent data.

However, in the notebook that creates the silver table, I get an error when running the DLT pipeline with a message saying "Unable to process statement for View 'VehicleInspectionProtocol_Flattened_Temp'. [FIELD_NOT_FOUND] No such struct field `RegistrationDate` in ...". RegistrationDate was one of the fields not part of the records initially. Can I implement the notebook in a better way?

CREATE TEMPORARY STREAMING LIVE VIEW VehicleInspectionProtocol_Explode_Temp
AS
SELECT 
  explode(Items) as Item,
  AuditSourceFile,
  AuditLoadTime
FROM STREAM(LIVE.vehicleInspectionProtocol_Raw);

CREATE TEMPORARY STREAMING LIVE VIEW VehicleInspectionProtocol_Flattened_Temp
AS
SELECT
  Item.Id,
  Item.RegistrationDate,
  Item.InspectionDateTime,
  Item.VehicleInspectionType,
  AuditSourceFile,
  AuditLoadTime
FROM STREAM(LIVE.VehicleInspectionProtocol_Explode_Temp);

CREATE OR REFRESH STREAMING LIVE TABLE VehicleInspectionProtocol;

APPLY CHANGES INTO
  LIVE.VehicleInspectionProtocol
FROM
  STREAM(LIVE.VehicleInspectionProtocol_Flattened_Temp)
KEYS
  (Id)
SEQUENCE BY
  AuditSourceFile;

Thanks,
Magnus