cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

FIELD_NOT_FOUND when selecting field not part of original schema

Magnus
Contributor

Hi,

I'm implementing a DLT pipeline using Auto Loader to ingest json files. The json files contains an array called Items that contains records and two of the fields in the records wasn't part of the original schema, but has been added later. Auto Loader has updated the schema and when I query the bronze table in Databricks SQL, I can select the two added fields, and they are NULL in the data that was loaded before they existed in the json files and contains values in more recent data.

However, in the notebook that creates the silver table, I get an error when running the DLT pipeline with a message saying "Unable to process statement for View 'VehicleInspectionProtocol_Flattened_Temp'. [FIELD_NOT_FOUND] No such struct field `RegistrationDate` in ...". RegistrationDate was one of the fields not part of the records initially. Can I implement the notebook in a better way?

 

CREATE TEMPORARY STREAMING LIVE VIEW VehicleInspectionProtocol_Explode_Temp
AS
SELECT 
  explode(Items) as Item,
  AuditSourceFile,
  AuditLoadTime
FROM STREAM(LIVE.vehicleInspectionProtocol_Raw);

CREATE TEMPORARY STREAMING LIVE VIEW VehicleInspectionProtocol_Flattened_Temp
AS
SELECT
  Item.Id,
  Item.RegistrationDate,
  Item.InspectionDateTime,
  Item.VehicleInspectionType,
  AuditSourceFile,
  AuditLoadTime
FROM STREAM(LIVE.VehicleInspectionProtocol_Explode_Temp);

CREATE OR REFRESH STREAMING LIVE TABLE VehicleInspectionProtocol;

APPLY CHANGES INTO
  LIVE.VehicleInspectionProtocol
FROM
  STREAM(LIVE.VehicleInspectionProtocol_Flattened_Temp)
KEYS
  (Id)
SEQUENCE BY
  AuditSourceFile;

 

Thanks,
Magnus

1 REPLY 1

I used a schema hint in the Auto Loader notebook to explicitly define the two fields that caused the error. Then it worked. Thanks @Retired_mod !

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group