Databricks SQL Error outputting sesntive data to logs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-11-2025 03:24 PM
Hi - I am using `from_json` with FAILFAST to correctly format some data using databricks SQL. However, this function can return the error "[MALFORMED_RECORD_IN_PARSING.WITHOUT_SUGGESTION] Malformed records are detected in record parsing" with the rest of the line being the data that caused the error.
Is there any way to prevent this from happening and is there anywhere else this can happen? The data I am working with is sensitive and I don't want it appearing in our logs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-01-2025 12:00 AM
Checking.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-01-2025 05:11 AM - edited 05-01-2025 05:12 AM
You could use
mode(defaultPERMISSIVEallows a mode for dealing with corrupt records during parsing.PERMISSIVE: when it meets a corrupted record, puts the malformed string into a field configured bycolumnNameOfCorruptRecord, and sets malformed fields to null. To keep corrupt records, you can set a string type field namedcolumnNameOfCorruptRecordin an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds acolumnNameOfCorruptRecordfield in an output schema.
columnNameOfCorruptRecord(default is the value specified inspark.sql.columnNameOfCorruptRecordallows renaming the new field having malformed string created byPERMISSIVEmode. This overridesspark.sql.columnNameOfCorruptRecord.
Doc - https://docs.databricks.com/aws/en/sql/language-manual/functions/from_json