Databricks

laurencewells · ‎01-11-2022

Hi All,

Hoping you can help. I am looking to set up a custom logging process that captures application ETL logs and Streaming logs

I have set up multiple custom logging appenders using the guide here:

https://kb.databricks.com/clusters/overwrite-log4j-logs.html

This is working and the logs are collecting as expected however the filtering on the custom logging is not working for the customStream appender. It is aiming to use the same syntax as the databricks referenced filter further up in relation to ADLS transport

Anyone got any thoughts on what's going on or a way to test this ?

log4j.appender.customStream.filter.str=com.databricks.logging.DatabricksLogFilter

log4j.appender.customStream.filter.str.LoggerName=org.apache.spark.sql.execution.streaming.MicroBatchExecution

log4j.appender.customStream.filter.str.StringToMatch=progress:

log4j.appender.customStream.filter.str.AcceptOnMatch=true

log4j.appender.customStream.filter.def=com.databricks.logging.DatabricksLogFilter.DenyAllFilter

Full Log4j Properties file

# The driver logs will be divided into three different logs: stdout, stderr, and log4j. The stdout
# and stderr are rolled using StdoutStderrRoller. The log4j logs are again split into two: public
# and private. Stdout, stderr, and only the public log4j logs are shown to the customers.
log4j.rootCategory=INFO, publicFile
 
# Use the private logger method from the ConsoleLogging trait to log to the private file.
# All other logs will go to the public file.
log4j.logger.privateLog=INFO, privateFile
log4j.additivity.privateLog=false
 
# privateFile
log4j.appender.privateFile=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.privateFile.layout=org.apache.log4j.PatternLayout
log4j.appender.privateFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.privateFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.privateFile.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.privateFile.rollingPolicy.ActiveFileName=logs/active.log
 
# publicFile
log4j.appender.publicFile=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.publicFile.layout=org.apache.log4j.PatternLayout
log4j.appender.publicFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.publicFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.publicFile.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.publicFile.rollingPolicy.ActiveFileName=logs/log4j-active.log
 
# Increase log level of NewHadoopRDD so it doesn't print every split.
# (This is really because Parquet prints the whole schema for every part.)
log4j.logger.org.apache.spark.rdd.NewHadoopRDD=WARN
 
# Enable logging for Azure Data Lake (SC-14894)
log4j.logger.com.microsoft.azure.datalake.store=DEBUG
log4j.logger.com.microsoft.azure.datalake.store.HttpTransport=DEBUG
log4j.logger.com.microsoft.azure.datalake.store.HttpTransport.tokens=DEBUG
# We also add custom filter to remove excessive logging of successful http requests
log4j.appender.publicFile.filter.adl=com.databricks.logging.DatabricksLogFilter
log4j.appender.publicFile.filter.adl.LoggerName=com.microsoft.azure.datalake.store.HttpTransport
log4j.appender.publicFile.filter.adl.StringToMatch=HTTPRequest,Succeeded
log4j.appender.publicFile.filter.adl.AcceptOnMatch=false
 
# RecordUsage Category
log4j.logger.com.databricks.UsageLogging=INFO, usage
log4j.additivity.com.databricks.UsageLogging=false
log4j.appender.usage=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.usage.layout=org.apache.log4j.PatternLayout
log4j.appender.usage.layout.ConversionPattern=%m%n
log4j.appender.usage.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.usage.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.usage.json.gz
log4j.appender.usage.rollingPolicy.ActiveFileName=logs/usage.json
 
# Product Logs
log4j.logger.com.databricks.ProductLogging=INFO, product
log4j.additivity.com.databricks.ProductLogging=false
log4j.appender.product=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.product.layout=org.apache.log4j.PatternLayout
log4j.appender.product.layout.ConversionPattern=%m%n
log4j.appender.product.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.product.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.product.json.gz
log4j.appender.product.rollingPolicy.ActiveFileName=logs/product.json
 
# Lineage Logs
log4j.logger.com.databricks.LineageLogging=INFO, lineage
log4j.additivity.com.databricks.LineageLogging=false
log4j.appender.lineage=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.lineage.layout=org.apache.log4j.PatternLayout
log4j.appender.lineage.layout.ConversionPattern=%m%n
log4j.appender.lineage.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.lineage.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.lineage.json.gz
log4j.appender.lineage.rollingPolicy.ActiveFileName=logs/lineage.json
log4j.appender.lineage.encoding=UTF-8
 
# Metrics Logs
log4j.logger.com.databricks.MetricsLogging=INFO, metrics
log4j.additivity.com.databricks.MetricsLogging=false
log4j.appender.metrics=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.metrics.layout=org.apache.log4j.PatternLayout
log4j.appender.metrics.layout.ConversionPattern=%m%n
log4j.appender.metrics.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.metrics.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.metrics.json.gz
log4j.appender.metrics.rollingPolicy.ActiveFileName=logs/metrics.json
log4j.appender.metrics.encoding=UTF-8
 
# Ignore messages below warning level from Jetty, because it's a bit verbose
#log4j.logger.org.eclipse.jetty=WARN
 
log4j.appender.custom=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.custom.layout=org.apache.log4j.PatternLayout
log4j.appender.custom.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} || %p || %c{1}: %m%n
log4j.appender.custom.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.custom.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-custom-logs-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.custom.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-custom-file-active.log
log4j.logger.appLogger= INFO, custom
log4j.additivity.appLogger=false
 
log4j.logger.org.apache.spark.sql.execution.streaming.MicroBatchExecution=INFO, customStream
log4j.additivity.org.apache.spark.sql.execution.streaming.MicroBatchExecution=false
log4j.appender.customStream=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.customStream.layout=org.apache.log4j.PatternLayout
log4j.appender.customStream.layout.ConversionPattern=%m%n
log4j.appender.customStream.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.customStream.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-customStream-logs-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.customStream.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-customStream-file-active.log
log4j.appender.customStream.filter.str=com.databricks.logging.DatabricksLogFilter
log4j.appender.customStream.filter.str.LoggerName=org.apache.spark.sql.execution.streaming.MicroBatchExecution
log4j.appender.customStream.filter.str.StringToMatch=progress:
log4j.appender.customStream.filter.str.AcceptOnMatch=true
log4j.appender.customStream.filter.def=com.databricks.logging.DatabricksLogFilter.DenyAllFilter
 
log4j.appender.customJson=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.customJson.layout=org.apache.log4j.PatternLayout
log4j.appender.customJson.layout.ConversionPattern=%m%n
log4j.appender.customJson.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.customJson.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-customJSON-logs-%d{yyyy-MM-dd-HH}.json.gz
log4j.appender.customJson.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-customJSON-file-active.json
log4j.logger.appLoggerJson = INFO, customJson
log4j.additivity.appLoggerJson=false

which

laurencewells · ‎05-31-2022

Hey unfortunately not. That is a blog about log4j vulnerability. Fortunately databricks are upgrading to log4j v2 in runtime 11 so its a mute point now. v2 has much better filters etc

View solution in original post

Anonymous · ‎01-11-2022

Hello @Laurence Wells. It's great to meet you and thank you for your question! We'll give the members a chance to answer before we come back to this. Thanks for your patience!

Kaniz · ‎01-24-2022

Hi @Laurence Wells , Please go through the blog.

Anonymous · ‎05-31-2022

Hey there @Laurence Wells

Hope you are doing great.

Does @Kaniz Fatma 's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

Thanks!

laurencewells · ‎05-31-2022

Hey unfortunately not. That is a blog about log4j vulnerability. Fortunately databricks are upgrading to log4j v2 in runtime 11 so its a mute point now. v2 has much better filters etc

Databricks

Log4J Custom Filter Not Working

Announcing the General Availability of Databricks Asset Bundles

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Register now and save 50% on training at Data + AI Summit!