Databricks Community

Vetrivel · ‎02-27-2025

I am looking to integrate CloudWatch logs with Databricks. Our objective is not to monitor Databricks via CloudWatch, but rather to facilitate access to CloudWatch logs from within Databricks. If anyone has implemented a similar solution, kindly provide guidance.

#cloudwatch #federation

mark_ott · ‎11-07-2025

To access CloudWatch logs from within Databricks, you can set up an integration that enables Databricks to fetch, query, and analyze AWS CloudWatch log data directly—without configuring CloudWatch to monitor Databricks clusters. This approach is increasingly popular for audit, operational, and troubleshooting purposes, especially when logs are stored in AWS and need to be analyzed in Databricks for downstream data engineering or analytics workflows.

Recommended Integration Methods

The most common methods to achieve CloudWatch logs access from within Databricks are:

Using AWS SDKs (boto3 for Python):
- Install the boto3 Python library in your Databricks notebooks.
- Set up appropriate AWS credentials via environment variables or Databricks secrets.
- Use boto3 to programmatically query CloudWatch logs using CloudWatch Logs APIs (filter_log_events, etc.), then load the results into a DataFrame for analysis.
- Example snippet:
  
  python
  
  import boto3 client = boto3.client('logs', region_name='your-region') response = client.filter_log_events( logGroupName='your-log-group', filterPattern='[pattern]', startTime=int(start_timestamp), endTime=int(end_timestamp) ) # Convert response['events'] to Pandas or Spark DataFrame for further processing
- This method allows you to run queries, extract logs, and perform transformations using Databricks native tools.
Centralized Log Storage in S3:
- Export CloudWatch logs to Amazon S3 using AWS native features or by creating log subscriptions.
- Databricks can read logs directly from S3 using Spark’s built-in capabilities.
- This decouples log storage from CloudWatch and leverages Databricks’ efficiency in reading large files from S3.
Federation with Databricks Unity Catalog:
- In advanced setups, CloudWatch or S3-exported logs can be ingested into Databricks Unity Catalog tables via custom pipelines with Spark or through partner tools such as Telegraf (which can push logs into Databricks using REST APIs).

Best Practices and Configuration

Assign appropriate IAM roles to enable Databricks clusters to read logs from CloudWatch (via APIs) or S3.
Secure credentials and follow least privilege principles when configuring cross-service access.
Consider batching and filtering queries, as CloudWatch API limits might affect performance.
For real-time analysis, schedule regular jobs or use Delta Lake to store and refresh ingested logs within Databricks.

Useful Resources

AWS documentation: “How to Monitor Databricks with Amazon CloudWatch”—contains guidance on credentials and agent setup (even if you’re not monitoring Databricks itself).
Databricks Community: Thread on sending logs from Databricks to CloudWatch, with comments on how native integrations are limited—and that boto3 and S3 exports are preferred for custom use cases.
InfluxData/Telegraf: Describes a workflow to ingest logs from CloudWatch directly into Databricks for analytics.

By using one of these supported approaches, Databricks can efficiently access, process, and analyze CloudWatch logs in a secure and scalable way without native direct integration.