Hi @mikeagicman, When you encounter the error message 'terminated with exception: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered unknown fields during parsing.'
, it means that the data file contains fields that are not defined in your schema hint. These additional fields are causing the parsing process to fail. The recommendation you received, 'which can be fixed by an automatic retry: false.'
, indicates that the system will not automatically retry processing the file after encountering this error. In other words, it won’t make another attempt to parse the data with the same schema hint. Instead, it expects you to address the issue manually.
You’ve already set inferColumnTypes
to true and schemaEvolutionMode
to addNewColumns
. However, in this specific case, it seems that the complexity of the data file is causing trouble.
Let’s explore some potential solutions:
-
Review the Schema Hint: Double-check your schema hint. Ensure that it accurately reflects the fields present in the data file. Sometimes, a missing or incorrect field name in the hint can lead to this error.
-
Inspect the Additional Fields: Look at the list of additional fields that were identified. Are they truly new fields, or are they variations of existing fields? Sometimes, small differences (e.g., case sensitivity, underscores, or spaces) can cause issues.
-
Explicitly Define New Fields: If the schema hint doesn’t cover all the fields in your data, consider explicitly defining the new fields. You can add them to the schema hint or handle them separately during processing.
-
Custom Handling for Unknown Fields: Implement custom logic to handle unknown fields. For example, you could log them, ignore them, or dynamically adjust the schema based on the encountered fields.
-
Retry with a Simplified File: Since your other pipeline worked well with a less complex file, try simplifying the problematic file. Remove some fields or reduce its complexity to see if it resolves the issue.
- Check the logs for more detailed error messages.
- Verify that the data file is correctly formatted as JSON.
- Inspect the actual data in the file to identify any unexpected fields.
Good luck, and I hope this helps you resolve the issue!