topic Re: Databricks SQL Error outputting sesntive data to logs in Data Engineering

Databricks SQL Error outputting sesntive data to logs

seanstachff — Tue, 11 Feb 2025 23:24:13 GMT

Hi - I am using `from_json` with FAILFAST to correctly format some data using databricks SQL. However, this function can return the error "[MALFORMED_RECORD_IN_PARSING.WITHOUT_SUGGESTION] Malformed records are detected in record parsing" with the rest of the line being the data that caused the error.

Is there any way to prevent this from happening and is there anywhere else this can happen? The data I am working with is sensitive and I don't want it appearing in our logs.

Re: Databricks SQL Error outputting sesntive data to logs

NandiniN — Thu, 01 May 2025 07:00:55 GMT

Checking.

Re: Databricks SQL Error outputting sesntive data to logs

NandiniN — Thu, 01 May 2025 12:12:21 GMT

You could use

mode (default PERMISSIVE allows a mode for dealing with corrupt records during parsing.
- PERMISSIVE: when it meets a corrupted record, puts the malformed string into a field configured by columnNameOfCorruptRecord, and sets malformed fields to null. To keep corrupt records, you can set a string type field named columnNameOfCorruptRecord in an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a columnNameOfCorruptRecord field in an output schema.
columnNameOfCorruptRecord (default is the value specified in spark.sql.columnNameOfCorruptRecord allows renaming the new field having malformed string created by PERMISSIVE mode. This overrides spark.sql.columnNameOfCorruptRecord.

Doc - https://docs.databricks.com/aws/en/sql/language-manual/functions/from_json