cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to push Cluster Logs to Elastic Search?

User15813097110
New Contributor III
 
1 REPLY 1

User15813097110
New Contributor III

We can use the below steps to push Cluster Logs to Elastic Search:

1. Download the log4j-elasticsearch-java-api repo and build the jar file:

git clone https://github.com/Downfy/log4j-elasticsearch-java-api.git
cd log4j-elasticsearch-java-api/
mvn clean install -Dmaven.test.skip=true

2. Go to the Libraries tab of the cluster and upload the jar file (located at /target/log4j-elasticsearch-1.0.0-RELEASE.jar). Now the jar file should be saved to a DBFS location something like this:

dbfs:/FileStore/jars/9294d79f_8d33_4270_9a52_cc36c2651220-log4j_elasticsearch_1_0_0_RELEASE-970d7.jar

3. Zip the 30 dependent jar files at /target/lib into one file dependency.zip and copy it to DBFS. For example, you can use Databricks CLI to upload the files to DBFS:

dbfs mkdirs dbfs:/dilip/elkzip/
dbfs mkdirs dbfs:/dilip/elkjar/
dbfs cp Desktop/log4j-elasticsearch-java-api/target/dependency.zip dbfs:/dilip/elkzip/

4. Unzip the jar files to another DBFS location using the followig notebook command:

%sh unzip /dbfs/dilip/elkzip/dependency.zip -d /dbfs/dilip/elkjar/

5. Run the following Python notebook command to create the init script (please change the file name and path as appropriate):

%python
dbutils.fs.put("/dilip/init-scripts/setLog4jProperties.sh","""
#!/bin/bash
set -e
cp /dbfs/FileStore/jars/9294d79f_8d33_4270_9a52_cc36c2651220-log4j_elasticsearch_1_0_0_RELEASE-970d7.jar /databricks/jars/
cp /dbfs/dilip/elkjar/*.jar /databricks/jars/
cat << EOF >> /databricks/spark/dbconf/log4j/driver/log4j.properties
# RootLogger
log4j.rootLogger=INFO,stdout,elastic
# Logging Threshold
log4j.threshhold=ALL
#
# stdout
# Add *stdout* to root logger above if you want to use this
#
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
# ElasticSearch log4j appender for application
log4j.appender.elastic=com.letfy.log4j.appenders.ElasticSearchClientAppender
log4j.appender.elastic.elasticHost=internal-vip-elasticsearh-int-dev-7645241416321.us-west-2.elb.amazonaws.com
log4j.appender.elastic.hostName=my_laptop
log4j.appender.elastic.applicationName=elkdemo
log4j.appender.elastic.elasticIndex=logging-elk
log4j.appender.elastic.elasticType=logging
EOF
""", True)

6. Go to the cluster -> "Advanced Options"->"Init Scripts", and then follow the steps outlined in section "Configure a cluster-scoped init script using the UI" in the following documentation (in our case, the path to the init-script is dbfs:/dilip/init-scripts/setLog4jProperties.sh)

https://docs.databricks.com/clusters/init-scripts.html#cluster-scoped-init-scripts

7. Restart the cluster and check the driver logs. Logs should now be available on your Elastic Search.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group