<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: CloudWatch Agent Init Script Fails in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58942#M811</link>
    <description>&lt;P&gt;That's correct &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99070"&gt;@Carsten03&lt;/a&gt;! Glad to learn that the issue is now resolved and I was able to contribute.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 01 Feb 2024 06:36:58 GMT</pubDate>
    <dc:creator>Yeshwanth</dc:creator>
    <dc:date>2024-02-01T06:36:58Z</dc:date>
    <item>
      <title>CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58783#M793</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am trying to install the CloudWatch log agent on my cluster, using this tutorial from AWS &lt;A href="https://aws.amazon.com/blogs/mt/how-to-monitor-databricks-with-amazon-cloudwatch/" target="_blank" rel="noopener"&gt;https://aws.amazon.com/blogs/mt/how-to-monitor-databricks-with-amazon-cloudwatch/&lt;/A&gt;&lt;/P&gt;&lt;P data-unlink="true"&gt;They provide an &lt;A href="https://pages.databricks.com/rs/094-YMS-629/images/cloudWatchInit.sh" target="_blank" rel="noopener"&gt;init script&lt;/A&gt; there but when I try to start my cluster I get&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Cluster scoped init script s3://xxx/cloudWatchInit.sh: Script exit status is non-zero.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;It looks like the script uses Log4j 1 and Databricks Runtime supports only log4j 2. I am not very familiar with Log4j and Java. I have tried to use the newest jar file but it results in the same error. I don't know how to get a more detailed error message though and it is quite hard to debug as in the Notebook I can't run any sudo commands.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone had the same problem and found a solution to this?&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jan 2024 05:57:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58783#M793</guid>
      <dc:creator>Carsten03</dc:creator>
      <dc:date>2024-01-31T05:57:12Z</dc:date>
    </item>
    <item>
      <title>Re: CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58866#M797</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99070"&gt;@Carsten03&lt;/a&gt;&amp;nbsp;could you please confirm the Databricks Runtime version that you are using?&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jan 2024 15:57:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58866#M797</guid>
      <dc:creator>Yeshwanth</dc:creator>
      <dc:date>2024-01-31T15:57:24Z</dc:date>
    </item>
    <item>
      <title>Re: CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58927#M806</link>
      <description>&lt;P&gt;Hey &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/32523"&gt;@Yeshwanth&lt;/a&gt;&amp;nbsp;I have tried it with 13.3 and 14.2 runtimes&lt;/P&gt;</description>
      <pubDate>Thu, 01 Feb 2024 05:08:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58927#M806</guid>
      <dc:creator>Carsten03</dc:creator>
      <dc:date>2024-02-01T05:08:18Z</dc:date>
    </item>
    <item>
      <title>Re: CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58933#M807</link>
      <description>&lt;P&gt;Perhaps you can manually upload log4j.xml and config.json,&lt;/P&gt;&lt;P&gt;Sample files can be found from this article&lt;/P&gt;&lt;P&gt;&lt;A href="https://reflectoring.io/struct-log-with-cloudwatch-tutorial/" target="_blank"&gt;Structured Logging with Spring Boot and Amazon CloudWatch (reflectoring.io)&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Feb 2024 05:46:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58933#M807</guid>
      <dc:creator>feiyun0112</dc:creator>
      <dc:date>2024-02-01T05:46:43Z</dc:date>
    </item>
    <item>
      <title>Re: CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58935#M808</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99070"&gt;@Carsten03&lt;/a&gt;&amp;nbsp;thank you for sharing the details.&lt;/P&gt;
&lt;P&gt;I am attaching an init script, please try using it as a &lt;A href="https://docs.databricks.com/en/init-scripts/global.html#use-global-init-scripts" target="_self"&gt;Global Init Script&lt;/A&gt; and keep me posted with the results.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;#!/bin/bash

set -ex

# jar for custom json logging
wget -q -O /mnt/driver-daemon/jars/log4j12-json-layout-1.0.0.jar https://sa-iot.s3.ca-central-1.amazonaws.com/collateral/log4j12-json-layout-1.0.0.jar

cd /tmp

# download cloudwatch agent
wget -q https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb
wget -q https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb.sig
KEY=$(curl https://s3.amazonaws.com/amazoncloudwatch-agent/assets/amazon-cloudwatch-agent.gpg 2&amp;gt;/dev/null| gpg --import 2&amp;gt;&amp;amp;1 |  cut -d: -f2 | grep 'key' | sed -r 's/\s*|key//g')
FINGERPRINT=$(echo "9376 16F3 450B 7D80 6CBD 9725 D581 6730 3B78 9C72" | sed 's/\s//g')
# verify signature
if ! gpg --fingerprint $KEY| sed -r 's/\s//g' | grep -q "${FINGERPRINT}"; then
  echo "cloudwatch agent deb gpg key fingerprint is invalid"
  exit 1
fi
if ! gpg --verify ./amazon-cloudwatch-agent.deb.sig ./amazon-cloudwatch-agent.deb; then
  echo "cloudwatch agent signature does not match deb"
  exit 1
fi
sudo apt-get install ./amazon-cloudwatch-agent.deb

# Get the cluster name
pip install awscli
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
ZONE=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
REGION=${ZONE%?}
CLUSTER_NAME=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=key,Values=ClusterName" --region=$REGION --output=text | cut -f5)
CLUSTER_NAME=$CLUSTER_NAME-$DB_CLUSTER_ID

# configure cloudwatch agent for driver &amp;amp; executor
if  [  ! -z $DB_IS_DRIVER ] &amp;amp;&amp;amp; [ $DB_IS_DRIVER = TRUE ] ; then
    cat &amp;gt; /tmp/amazon-cloudwatch-agent.json &amp;lt;&amp;lt; EOF
{"agent":{"metrics_collection_interval":10,"logfile":"/var/log/amazon-cloudwatch-agent.log","debug":false},"logs":{"logs_collected":{"files":{"collect_list":[{"file_path":"/databricks/driver/logs/log4j-active.log","log_group_name":"/databricks/$CLUSTER_NAME/driver/spark-log","log_stream_name":"databricks-cloudwatch"},{"file_path":"/databricks/driver/logs/stderr","log_group_name":"/databricks/$CLUSTER_NAME/driver/stderr","log_stream_name":"databricks-cloudwatch"},{"file_path":"/databricks/driver/logs/stdout","log_group_name":"/databricks/$CLUSTER_NAME/driver/stdout","log_stream_name":"databricks-cloudwatch"}]}}},"metrics":{"namespace":"$CLUSTER_NAME","metrics_collected":{"statsd":{"service_address":":8125"},"cpu":{"resources":["*"],"measurement":[{"name":"cpu_usage_idle","rename":"DRIVER_CPU_USAGE_IDLE","unit":"Percent"},{"name":"cpu_usage_iowait","rename":"DRIVER_CPU_USAGE_IOWAIT","unit":"Percent"},{"name":"cpu_time_idle","rename":"DRIVER_CPU_TIME_IDLE","unit":"Percent"},{"name":"cpu_time_iowait","rename":"DRIVER_CPU_TIME_IOWAIT","unit":"Percent"}],"totalcpu":true},"disk":{"resources":["/"],"measurement":[{"name":"disk_free","rename":"DRIVER_DISK_FREE","unit":"Gigabytes"},{"name":"disk_inodes_free","rename":"DRIVER_DISK_INODES_FREE","unit":"Count"},{"name":"disk_inodes_total","rename":"DRIVER_DISK_INODES_TOTAL","unit":"Count"},{"name":"disk_inodes_used","rename":"DRIVER_DISK_INODES_USED","unit":"Count"}]},"diskio":{"resources":["*"],"measurement":[{"name":"diskio_iops_in_progress","rename":"DRIVER_DISKIO_IOPS_IN_PROGRESS","unit":"Megabytes"},{"name":"diskio_read_time","rename":"DRIVER_DISKIO_READ_TIME","unit":"Megabytes"},{"name":"diskio_write_time","rename":"DRIVER_DISKIO_WRITE_TIME","unit":"Megabytes"}]},"mem":{"measurement":[{"name":"mem_available","rename":"DRIVER_MEM_AVAILABLE","unit":"Megabytes"},{"name":"mem_total","rename":"DRIVER_MEM_TOTAL","unit":"Megabytes"},{"name":"mem_used","rename":"DRIVER_MEM_USED","unit":"Megabytes"},{"name":"mem_used_percent","rename":"DRIVER_MEM_USED_PERCENT","unit":"Megabytes"},{"name":"mem_available_percent","rename":"DRIVER_MEM_AVAILABLE_PERCENT","unit":"Megabytes"}]},"net":{"resources":["eth0"],"measurement":[{"name":"net_bytes_recv","rename":"DRIVER_NET_BYTES_RECV","unit":"Bytes"},{"name":"net_bytes_sent","rename":"DRIVER_NET_BYTES_SENT","unit":"Bytes"}]}},"append_dimensions":{"InstanceId":"\${aws:InstanceId}"}}}
EOF

  sed -i '/^log4j.appender.publicFile.layout/ s/^/#/g' /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j2.xml
  sed -i '/log4j.appender.publicFile=com.databricks.logging.RedactionRollingFileAppender/a log4j.appender.publicFile.layout=com.databricks.labs.log.appenders.JsonLayout' /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j2.xml
else
  cat &amp;gt; /tmp/amazon-cloudwatch-agent.json &amp;lt;&amp;lt; EOF
{"agent":{"metrics_collection_interval":10,"logfile":"/var/log/amazon-cloudwatch-agent.log","debug":true},"logs":{"logs_collected":{"files":{"collect_list":[{"file_path":"/databricks/spark/work/*/*/stderr","log_group_name":"/databricks/$CLUSTER_NAME/executor/stderr","log_stream_name":"databricks-cloudwatch"},{"file_path":"/databricks/spark/work/*/*/stdout","log_group_name":"/databricks/$CLUSTER_NAME/executor/stdout","log_stream_name":"databricks-cloudwatch"}]}}},"metrics":{"namespace":"$CLUSTER_NAME","metrics_collected":{"statsd":{"service_address":":8125"},"cpu":{"resources":["*"],"measurement":[{"name":"cpu_usage_idle","rename":"EXEC_CPU_USAGE_IDLE","unit":"Percent"},{"name":"cpu_usage_iowait","rename":"EXEC_CPU_USAGE_IOWAIT","unit":"Percent"},{"name":"cpu_time_idle","rename":"EXEC_CPU_TIME_IDLE","unit":"Percent"},{"name":"cpu_time_iowait","rename":"EXEC_CPU_TIME_IOWAIT","unit":"Percent"}],"totalcpu":true},"disk":{"resources":["/"],"measurement":[{"name":"disk_free","rename":"EXEC_DISK_FREE","unit":"Gigabytes"},{"name":"disk_inodes_free","rename":"EXEC_DISK_INODES_FREE","unit":"Count"},{"name":"disk_inodes_total","rename":"EXEC_DISK_INODES_TOTAL","unit":"Count"},{"name":"disk_inodes_used","rename":"EXEC_DISK_INODES_USED","unit":"Count"}]},"diskio":{"resources":["*"],"measurement":[{"name":"diskio_iops_in_progress","rename":"EXEC_DISKIO_IOPS_IN_PROGRESS","unit":"Megabytes"},{"name":"diskio_read_time","rename":"EXEC_DISKIO_READ_TIME","unit":"Megabytes"},{"name":"diskio_write_time","rename":"EXEC_DISKIO_WRITE_TIME","unit":"Megabytes"}]},"mem":{"measurement":[{"name":"mem_available","rename":"EXEC_MEM_AVAILABLE","unit":"Megabytes"},{"name":"mem_total","rename":"EXEC_MEM_TOTAL","unit":"Megabytes"},{"name":"mem_used","rename":"EXEC_MEM_USED","unit":"Megabytes"},{"name":"mem_used_percent","rename":"EXEC_MEM_USED_PERCENT","unit":"Megabytes"},{"name":"mem_available_percent","rename":"EXEC_MEM_AVAILABLE_PERCENT","unit":"Megabytes"}]},"net":{"resources":["eth0"],"measurement":[{"name":"net_bytes_recv","rename":"EXEC_NET_BYTES_RECV","unit":"Bytes"},{"name":"net_bytes_sent","rename":"EXEC_NET_BYTES_SENT","unit":"Bytes"}]}},"append_dimensions":{"InstanceId":"\${aws:InstanceId}"}}}
EOF

  sed -i '/^log4j.appender.console.layout/ s/^/#/g' /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j2.xml
  sed -i '/log4j.appender.console.layout=org.apache.log4j.PatternLayout/a log4j.appender.console.layout=com.databricks.labs.log.appenders.JsonLayout' /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j2.xml
fi


#modify metrics config
sudo sed -i '/^driver.sink.ganglia.class/,+4 s/^/#/g' /databricks/spark/conf/metrics.properties
sudo bash -c "cat &amp;lt;&amp;lt;EOF &amp;gt;&amp;gt; /databricks/spark/conf/metrics.properties
*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink
*.sink.statsd.host=localhost
*.sink.statsd.port=8125
*.sink.statsd.prefix=spark
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
EOF"

#start cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/tmp/amazon-cloudwatch-agent.json -s
sudo systemctl enable amazon-cloudwatch-agent

/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a status -m ec2&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Feb 2024 05:49:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58935#M808</guid>
      <dc:creator>Yeshwanth</dc:creator>
      <dc:date>2024-02-01T05:49:12Z</dc:date>
    </item>
    <item>
      <title>Re: CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58939#M809</link>
      <description>&lt;P&gt;This works now! If I see it correctly you have renamed&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;log4j.properties&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;to&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;log4j2.xml&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;correct?&lt;BR /&gt;&lt;BR /&gt;Thank you very much!&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Feb 2024 06:29:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58939#M809</guid>
      <dc:creator>Carsten03</dc:creator>
      <dc:date>2024-02-01T06:29:28Z</dc:date>
    </item>
    <item>
      <title>Re: CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58942#M811</link>
      <description>&lt;P&gt;That's correct &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99070"&gt;@Carsten03&lt;/a&gt;! Glad to learn that the issue is now resolved and I was able to contribute.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Feb 2024 06:36:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/58942#M811</guid>
      <dc:creator>Yeshwanth</dc:creator>
      <dc:date>2024-02-01T06:36:58Z</dc:date>
    </item>
    <item>
      <title>Re: CloudWatch Agent Init Script Fails</title>
      <link>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/59935#M858</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/32523"&gt;@Yeshwanth&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have the exact same issue here. I have tried to update my cluster with the init script that you kindly shared in this thread. However, my error is still:&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;&lt;STRONG&gt;Init script failure&lt;/STRONG&gt;:&lt;BR /&gt;Cluster scoped init script s3://databricks-init-scripts-cadent/new-CloudWatch-init.sh failed: Script exit status is non-zero.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;I do not know anything about the runtimes. I also do not know what a Log4j is? Is there a particular library or something&amp;nbsp; I need to install on the cluster to get this to work?&lt;/P&gt;&lt;P&gt;Please let me know as this is an urgent requirement for the business.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2024 15:47:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/cloudwatch-agent-init-script-fails/m-p/59935#M858</guid>
      <dc:creator>alexbishop</dc:creator>
      <dc:date>2024-02-12T15:47:05Z</dc:date>
    </item>
  </channel>
</rss>

