cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Log4J Custom Filter Not Working

laurencewells
New Contributor III

Hi All,

Hoping you can help. I am looking to set up a custom logging process that captures application ETL logs and Streaming logs

I have set up multiple custom logging appenders using the guide here:

https://kb.databricks.com/clusters/overwrite-log4j-logs.html

This is working and the logs are collecting as expected however the filtering on the custom logging is not working for the customStream appender. It is aiming to use the same syntax as the databricks referenced filter further up in relation to ADLS transport

Anyone got any thoughts on what's going on or a way to test this ?

log4j.appender.customStream.filter.str=com.databricks.logging.DatabricksLogFilter

log4j.appender.customStream.filter.str.LoggerName=org.apache.spark.sql.execution.streaming.MicroBatchExecution

log4j.appender.customStream.filter.str.StringToMatch=progress:

log4j.appender.customStream.filter.str.AcceptOnMatch=true

log4j.appender.customStream.filter.def=com.databricks.logging.DatabricksLogFilter.DenyAllFilter

Full Log4j Properties file

# The driver logs will be divided into three different logs: stdout, stderr, and log4j. The stdout
# and stderr are rolled using StdoutStderrRoller. The log4j logs are again split into two: public
# and private. Stdout, stderr, and only the public log4j logs are shown to the customers.
log4j.rootCategory=INFO, publicFile
 
# Use the private logger method from the ConsoleLogging trait to log to the private file.
# All other logs will go to the public file.
log4j.logger.privateLog=INFO, privateFile
log4j.additivity.privateLog=false
 
# privateFile
log4j.appender.privateFile=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.privateFile.layout=org.apache.log4j.PatternLayout
log4j.appender.privateFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.privateFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.privateFile.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.privateFile.rollingPolicy.ActiveFileName=logs/active.log
 
# publicFile
log4j.appender.publicFile=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.publicFile.layout=org.apache.log4j.PatternLayout
log4j.appender.publicFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.publicFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.publicFile.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.publicFile.rollingPolicy.ActiveFileName=logs/log4j-active.log
 
# Increase log level of NewHadoopRDD so it doesn't print every split.
# (This is really because Parquet prints the whole schema for every part.)
log4j.logger.org.apache.spark.rdd.NewHadoopRDD=WARN
 
# Enable logging for Azure Data Lake (SC-14894)
log4j.logger.com.microsoft.azure.datalake.store=DEBUG
log4j.logger.com.microsoft.azure.datalake.store.HttpTransport=DEBUG
log4j.logger.com.microsoft.azure.datalake.store.HttpTransport.tokens=DEBUG
# We also add custom filter to remove excessive logging of successful http requests
log4j.appender.publicFile.filter.adl=com.databricks.logging.DatabricksLogFilter
log4j.appender.publicFile.filter.adl.LoggerName=com.microsoft.azure.datalake.store.HttpTransport
log4j.appender.publicFile.filter.adl.StringToMatch=HTTPRequest,Succeeded
log4j.appender.publicFile.filter.adl.AcceptOnMatch=false
 
# RecordUsage Category
log4j.logger.com.databricks.UsageLogging=INFO, usage
log4j.additivity.com.databricks.UsageLogging=false
log4j.appender.usage=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.usage.layout=org.apache.log4j.PatternLayout
log4j.appender.usage.layout.ConversionPattern=%m%n
log4j.appender.usage.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.usage.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.usage.json.gz
log4j.appender.usage.rollingPolicy.ActiveFileName=logs/usage.json
 
# Product Logs
log4j.logger.com.databricks.ProductLogging=INFO, product
log4j.additivity.com.databricks.ProductLogging=false
log4j.appender.product=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.product.layout=org.apache.log4j.PatternLayout
log4j.appender.product.layout.ConversionPattern=%m%n
log4j.appender.product.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.product.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.product.json.gz
log4j.appender.product.rollingPolicy.ActiveFileName=logs/product.json
 
# Lineage Logs
log4j.logger.com.databricks.LineageLogging=INFO, lineage
log4j.additivity.com.databricks.LineageLogging=false
log4j.appender.lineage=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.lineage.layout=org.apache.log4j.PatternLayout
log4j.appender.lineage.layout.ConversionPattern=%m%n
log4j.appender.lineage.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.lineage.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.lineage.json.gz
log4j.appender.lineage.rollingPolicy.ActiveFileName=logs/lineage.json
log4j.appender.lineage.encoding=UTF-8
 
# Metrics Logs
log4j.logger.com.databricks.MetricsLogging=INFO, metrics
log4j.additivity.com.databricks.MetricsLogging=false
log4j.appender.metrics=org.apache.log4j.rolling.DatabricksRollingFileAppender
log4j.appender.metrics.layout=org.apache.log4j.PatternLayout
log4j.appender.metrics.layout.ConversionPattern=%m%n
log4j.appender.metrics.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.metrics.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.metrics.json.gz
log4j.appender.metrics.rollingPolicy.ActiveFileName=logs/metrics.json
log4j.appender.metrics.encoding=UTF-8
 
# Ignore messages below warning level from Jetty, because it's a bit verbose
#log4j.logger.org.eclipse.jetty=WARN
 
log4j.appender.custom=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.custom.layout=org.apache.log4j.PatternLayout
log4j.appender.custom.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} || %p || %c{1}: %m%n
log4j.appender.custom.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.custom.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-custom-logs-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.custom.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-custom-file-active.log
log4j.logger.appLogger= INFO, custom
log4j.additivity.appLogger=false
 
log4j.logger.org.apache.spark.sql.execution.streaming.MicroBatchExecution=INFO, customStream
log4j.additivity.org.apache.spark.sql.execution.streaming.MicroBatchExecution=false
log4j.appender.customStream=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.customStream.layout=org.apache.log4j.PatternLayout
log4j.appender.customStream.layout.ConversionPattern=%m%n
log4j.appender.customStream.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.customStream.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-customStream-logs-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.customStream.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-customStream-file-active.log
log4j.appender.customStream.filter.str=com.databricks.logging.DatabricksLogFilter
log4j.appender.customStream.filter.str.LoggerName=org.apache.spark.sql.execution.streaming.MicroBatchExecution
log4j.appender.customStream.filter.str.StringToMatch=progress:
log4j.appender.customStream.filter.str.AcceptOnMatch=true
log4j.appender.customStream.filter.def=com.databricks.logging.DatabricksLogFilter.DenyAllFilter
 
log4j.appender.customJson=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.customJson.layout=org.apache.log4j.PatternLayout
log4j.appender.customJson.layout.ConversionPattern=%m%n
log4j.appender.customJson.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.customJson.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-customJSON-logs-%d{yyyy-MM-dd-HH}.json.gz
log4j.appender.customJson.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-customJSON-file-active.json
log4j.logger.appLoggerJson = INFO, customJson
log4j.additivity.appLoggerJson=false

which

1 ACCEPTED SOLUTION

Accepted Solutions

Hey unfortunately not. That is a blog about log4j vulnerability. Fortunately databricks are upgrading to log4j v2 in runtime 11 so its a mute point now.โ€‹ v2 has much better filters etc

View solution in original post

3 REPLIES 3

Anonymous
Not applicable

Hello @Laurence Wellsโ€‹. It's great to meet you and thank you for your question! We'll give the members a chance to answer before we come back to this. Thanks for your patience!

Anonymous
Not applicable

Hey there @Laurence Wellsโ€‹ 

Hope you are doing great.

Does @Kaniz Fatmaโ€‹ 's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

Thanks!

Hey unfortunately not. That is a blog about log4j vulnerability. Fortunately databricks are upgrading to log4j v2 in runtime 11 so its a mute point now.โ€‹ v2 has much better filters etc

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group