We can use the below steps to push Cluster Logs to Elastic Search:
1. Download the log4j-elasticsearch-java-api repo and build the jar file:
git clone https://github.com/Downfy/log4j-elasticsearch-java-api.git
cd log4j-elasticsearch-java-api/
mvn clean install -Dmaven.test.skip=true
2. Go to the Libraries tab of the cluster and upload the jar file (located at /target/log4j-elasticsearch-1.0.0-RELEASE.jar). Now the jar file should be saved to a DBFS location something like this:
dbfs:/FileStore/jars/9294d79f_8d33_4270_9a52_cc36c2651220-log4j_elasticsearch_1_0_0_RELEASE-970d7.jar
3. Zip the 30 dependent jar files at /target/lib into one file dependency.zip and copy it to DBFS. For example, you can use Databricks CLI to upload the files to DBFS:
dbfs mkdirs dbfs:/dilip/elkzip/
dbfs mkdirs dbfs:/dilip/elkjar/
dbfs cp Desktop/log4j-elasticsearch-java-api/target/dependency.zip dbfs:/dilip/elkzip/
4. Unzip the jar files to another DBFS location using the followig notebook command:
%sh unzip /dbfs/dilip/elkzip/dependency.zip -d /dbfs/dilip/elkjar/
5. Run the following Python notebook command to create the init script (please change the file name and path as appropriate):
%python
dbutils.fs.put("/dilip/init-scripts/setLog4jProperties.sh","""
#!/bin/bash
set -e
cp /dbfs/FileStore/jars/9294d79f_8d33_4270_9a52_cc36c2651220-log4j_elasticsearch_1_0_0_RELEASE-970d7.jar /databricks/jars/
cp /dbfs/dilip/elkjar/*.jar /databricks/jars/
cat << EOF >> /databricks/spark/dbconf/log4j/driver/log4j.properties
# RootLogger
log4j.rootLogger=INFO,stdout,elastic
# Logging Threshold
log4j.threshhold=ALL
#
# stdout
# Add *stdout* to root logger above if you want to use this
#
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
# ElasticSearch log4j appender for application
log4j.appender.elastic=com.letfy.log4j.appenders.ElasticSearchClientAppender
log4j.appender.elastic.elasticHost=internal-vip-elasticsearh-int-dev-7645241416321.us-west-2.elb.amazonaws.com
log4j.appender.elastic.hostName=my_laptop
log4j.appender.elastic.applicationName=elkdemo
log4j.appender.elastic.elasticIndex=logging-elk
log4j.appender.elastic.elasticType=logging
EOF
""", True)
6. Go to the cluster -> "Advanced Options"->"Init Scripts", and then follow the steps outlined in section "Configure a cluster-scoped init script using the UI" in the following documentation (in our case, the path to the init-script is dbfs:/dilip/init-scripts/setLog4jProperties.sh)
https://docs.databricks.com/clusters/init-scripts.html#cluster-scoped-init-scripts
7. Restart the cluster and check the driver logs. Logs should now be available on your Elastic Search.