Databricks Community

Ham · ‎02-05-2026

Hi everyone!
I’m running a config-driven ingestion stack that uses the Databricks SDK (Python notebooks + GitHub Actions). All logging currently uses the standard Python logging module inside notebooks/jobs (example: ingest.py, logger.py).

I’d like to move beyond storing log files in ADLS and instead push every run’s logs into Azure Monitor / Log Analytics so ops teams can query KQL, build alerts, etc. I know I can enable Databricks “Diagnostic settings” to pipe cluster/job/system logs into a Log Analytics workspace, but I’m trying to understand the cleanest way to integrate our custom Python logger output as well. Ideally I’d keep the current logging API but add the necessary handlers/exports so everything lands in a single Log Analytics workspace alongside Databricks diagnostics.

Does anyone have a reference architecture or step-by-step instructions for:

Configuring Databricks workspace diagnostics to Log Analytics (which categories matter for notebooks/jobs)?
Wiring Python logging inside notebooks/jobs so custom logs appear in Log Analytics (e.g., via OpenCensus/opencensus-ext-azure, Data Collection API, or a recommended Databricks pattern)?
Managing this through Infrastructure-as-Code or CI/CD (we already store configs under config/*.yaml and run automation via workflows)
If there’s an official Databricks/Microsoft doc or sample repo that shows a full pipeline (notebook logs + diagnostics + Log Analytics queries), I’d really appreciate pointers. Thanks! # Python #logging, #Azure Monitor #Log Analytics

SteveOstrowski · ‎03-07-2026

Hi @Ham,

This is a common scenario, and there are good solutions. There are several layers to getting "Databricks SDK (Python) ingestion logs" into Azure Monitor, depending on exactly which logs you need. I will walk through each approach from simplest to most flexible and then cover the IaC/CI-CD angle you asked about.

UNDERSTANDING WHAT LOGS YOU ARE DEALING WITH

The Databricks Python SDK uses standard Python logging under the logger name "databricks.sdk". When you enable debug logging, it emits HTTP request/response details for every API call -- endpoints hit, status codes, timing, and (redacted) headers. These are client-side logs generated wherever your Python code runs (a notebook cell, a job task, an external VM, etc.).

This is distinct from the platform-level audit logs that Databricks generates server-side (e.g., "user X created cluster Y"). I will cover both.

OPTION 1: AZURE DIAGNOSTIC SETTINGS (PLATFORM AUDIT LOGS -- NO CODE REQUIRED)

If your goal is to get Databricks platform activity (cluster operations, job runs, SQL warehouse queries, notebook events, workspace changes, etc.) into Log Analytics, the built-in diagnostic settings path is the simplest and requires zero code.

Steps:
1. Open the Azure portal and navigate to your Azure Databricks workspace resource.
2. Under Monitoring in the sidebar, click Diagnostic settings.
3. Click Add diagnostic setting.
4. Name the setting and check "Send to Log Analytics workspace."
5. Select your target Log Analytics workspace.
6. Choose the log categories you need. For notebook and job monitoring, enable at least: jobs, clusters, accounts, notebook, databrickssql. You can also add dbfs, repos, deltaPipelines, workspace, etc.
7. Save.

Logs typically appear within 15 minutes. They land in tables like DatabricksJobs, DatabricksClusters, DatabricksAccounts, etc. You can query them with KQL.

Limitation: Diagnostic settings only capture workspace-level audit events. Account-level events and some newer services (vectorSearch, clusterPolicies) are not available through this path. For comprehensive coverage, you can also query the audit log system table (system.access.audit) directly from Databricks SQL.

Requires: Premium plan.

Docs:
- Configure diagnostic log delivery: https://learn.microsoft.com/en-us/azure/databricks/admin/account-settings/audit-log-delivery
- Diagnostic log reference: https://learn.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/azure-diagn...

OPTION 2: AZURE MONITOR OPENTELEMETRY DISTRO (APPLICATION-LEVEL LOGS FROM YOUR CODE)

If you want your own Python application logs (including SDK debug output) to flow into Application Insights and then into your Log Analytics workspace, the Azure Monitor OpenTelemetry Distro for Python is the recommended modern approach. This replaces the older OpenCensus/opencensus-ext-azure path, which is on its deprecation journey.

Install:
pip install azure-monitor-opentelemetry

In your notebook or script:

import logging
from azure.monitor.opentelemetry import configure_azure_monitor

configure_azure_monitor(
connection_string="InstrumentationKey=xxx;IngestionEndpoint=https://xxx.monitor.azure.com/;...",
logger_name="my_ingestion_app",
)

logger = logging.getLogger("my_ingestion_app")

# Route Databricks SDK logs to Azure Monitor by attaching
# the same handlers to the SDK logger:
sdk_logger = logging.getLogger("databricks.sdk")
sdk_logger.setLevel(logging.DEBUG)

for handler in logging.getLogger("my_ingestion_app").handlers:
sdk_logger.addHandler(handler)

logger.info("Starting ingestion pipeline")
# ... your Databricks SDK calls here ...
# SDK debug logs will now flow to Application Insights

Key points:
- Set logger_name carefully. This controls which logger namespace is collected. You do NOT want to collect the SDK's own internal telemetry library logs (that causes recursion).
- The connection string comes from your Application Insights resource (Overview pane in the Azure portal).
- Application Insights data automatically lands in the associated Log Analytics workspace, so you can query it with KQL right alongside your diagnostic logs.
- You can set the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable instead of passing it in code, which is cleaner for production. Store it in a Databricks secret scope and retrieve it at runtime.
- If running inside a Databricks notebook, be aware that the OpenTelemetry distro starts background threads for export. Make sure your job does not terminate before the exporter flushes. Call the provider's force_flush() or shutdown() if needed.

Docs:
- Enable OpenTelemetry for Python: https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=python
- OpenCensus to OpenTelemetry migration: https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-python-opencensus-migrate

OPTION 3: LOGS INGESTION API (CUSTOM STRUCTURED LOGS TO A CUSTOM TABLE)

If you need full control over the schema -- for example, you want a custom table in Log Analytics with columns like IngestionJobId, SourceSystem, RecordCount, DurationSeconds, ErrorMessage -- use the Azure Monitor Logs Ingestion API with the Python client library.

Install:
pip install azure-monitor-ingestion azure-identity

Prerequisites in Azure:
1. Create a custom table in your Log Analytics workspace (e.g., DatabricksIngestionLogs_CL).
2. Create a Data Collection Rule (DCR) that maps your incoming JSON to that table.
3. Register an App (service principal) and grant it the "Monitoring Metrics Publisher" role on the DCR.

Python code:

import os
from azure.identity import DefaultAzureCredential
from azure.monitor.ingestion import LogsIngestionClient

endpoint = os.environ["DATA_COLLECTION_ENDPOINT"]
rule_id = os.environ["LOGS_DCR_RULE_ID"]
stream_name = os.environ["LOGS_DCR_STREAM_NAME"]

credential = DefaultAzureCredential()
client = LogsIngestionClient(endpoint=endpoint, credential=credential)

logs = [
{
"TimeGenerated": "2026-03-07T10:00:00Z",
"IngestionJobId": "job-12345",
"SourceSystem": "SalesforceConnector",
"RecordCount": 50000,
"DurationSeconds": 120,
"Status": "Success"
}
]

client.upload(rule_id=rule_id, stream_name=stream_name, logs=logs)

This gives you complete flexibility on what you log and how it is structured. You can wrap your Databricks SDK ingestion calls in try/except blocks and ship structured success/failure records to your custom table.

For authentication inside Databricks, use a service principal whose credentials are stored in a secret scope, or use managed identity if your workspace supports it.

Docs:
- Logs Ingestion API overview: https://learn.microsoft.com/en-us/azure/azure-monitor/logs/logs-ingestion-api-overview
- Python client library: https://learn.microsoft.com/en-us/python/api/overview/azure/monitor-ingestion-readme

OPTION 4: CUSTOM PYTHON LOGGING HANDLER (LIGHTWEIGHT BRIDGE)

If you do not need OpenTelemetry's full tracing/metrics capabilities and just want Python log records forwarded to Log Analytics, you can write a lightweight custom logging handler that batches and ships logs via the Logs Ingestion API:

import logging
from azure.monitor.ingestion import LogsIngestionClient
from azure.identity import DefaultAzureCredential

class AzureMonitorHandler(logging.Handler):
def __init__(self, endpoint, rule_id, stream_name):
super().__init__()
self.client = LogsIngestionClient(endpoint, DefaultAzureCredential())
self.rule_id = rule_id
self.stream_name = stream_name
self.buffer = []

def emit(self, record):
self.buffer.append({
"TimeGenerated": record.created,
"Level": record.levelname,
"Message": self.format(record),
"LoggerName": record.name,
})
if len(self.buffer) >= 50: # flush every 50 records
self.flush()

def flush(self):
if self.buffer:
try:
self.client.upload(
rule_id=self.rule_id,
stream_name=self.stream_name,
logs=self.buffer
)
except Exception:
self.handleError(None)
self.buffer = []

# Attach to the Databricks SDK logger
handler = AzureMonitorHandler(endpoint, rule_id, stream_name)
logging.getLogger("databricks.sdk").addHandler(handler)
logging.getLogger("databricks.sdk").setLevel(logging.INFO)

This approach avoids the overhead of a full OpenTelemetry setup while still getting SDK logs into Log Analytics in a structured way.

MANAGING THIS THROUGH INFRASTRUCTURE-AS-CODE AND CI/CD

Since you mentioned you already store configs under config/*.yaml and run automation via GitHub Actions workflows, here is how each piece fits into an IaC pipeline.

Diagnostic Settings via Terraform or ARM:

You can manage the diagnostic settings declaratively. With Azure CLI in a GitHub Actions step:

az monitor diagnostic-settings create \
--name "databricks-to-loganalytics" \
--resource "/subscriptions/$SUB_ID/resourceGroups/$RG/providers/Microsoft.Databricks/workspaces/$WS_NAME" \
--workspace "/subscriptions/$SUB_ID/resourceGroups/$RG/providers/microsoft.operationalinsights/workspaces/$LAW_NAME" \
--logs '[
{"category": "jobs", "enabled": true},
{"category": "clusters", "enabled": true},
{"category": "accounts", "enabled": true},
{"category": "notebook", "enabled": true},
{"category": "databrickssql", "enabled": true}
]'

Or with Terraform using the azurerm_monitor_diagnostic_setting resource:

resource "azurerm_monitor_diagnostic_setting" "databricks" {
name = "databricks-to-loganalytics"
target_resource_id = azurerm_databricks_workspace.this.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id

enabled_log {
category = "jobs"
}
enabled_log {
category = "clusters"
}
enabled_log {
category = "accounts"
}
enabled_log {
category = "notebook"
}
enabled_log {
category = "databrickssql"
}
}

With either approach you can also use the ARM/Bicep REST API method. The diagnostic settings configuration docs walk through all three:
https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings

Logs Ingestion API Infrastructure via Terraform:

If you go with Option 3 (custom structured logs), the DCR, custom table, and app registration can all be managed in Terraform as well:
- azurerm_log_analytics_workspace_table for the custom table
- azurerm_monitor_data_collection_rule for the DCR
- azurerm_role_assignment for granting the service principal the "Monitoring Metrics Publisher" role

Connection String and Credential Management:

Store your Application Insights connection string or Logs Ingestion API credentials in a Databricks secret scope. You can create secret scopes backed by Azure Key Vault for seamless rotation. In your config YAML, reference the secret scope path rather than raw credentials:

# config/logging.yaml
azure_monitor:
secret_scope: "monitoring-secrets"
connection_string_key: "appinsights-connection-string"
dcr_endpoint_key: "logs-ingestion-endpoint"
dcr_rule_id_key: "logs-dcr-rule-id"

Then in your Python code:
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
conn_string = w.dbutils.secrets.get("monitoring-secrets", "appinsights-connection-string")

WHICH APPROACH SHOULD YOU USE?

- Platform audit trail only (who did what, when) --> Option 1 (Diagnostic Settings)
- Application-level observability with traces + metrics + logs --> Option 2 (OpenTelemetry Distro)
- Custom structured ingestion telemetry with full schema control --> Option 3 (Logs Ingestion API)
- Simple log forwarding without OpenTelemetry overhead --> Option 4 (Custom Handler)

Most teams doing data ingestion monitoring end up combining Option 1 (for the platform audit trail) with either Option 2 or Option 3 (for application-specific telemetry). Option 1 requires zero code changes and gives you the "Databricks diagnostics" side of the house. Then pick Option 2 or 3 for your custom Python logger output.

ADDITIONAL TIPS

- Store all connection strings and credentials in a Databricks secret scope, not in code or environment variables visible in the notebook.
- The Databricks Python SDK logger name is "databricks.sdk". Set it to DEBUG for full HTTP-level detail or INFO for just high-level operations.
- The SDK redacts sensitive headers (auth tokens) by default. You can control this with the debug_headers config and truncation with the DATABRICKS_DEBUG_TRUNCATE_BYTES environment variable.
- If you were considering OpenCensus/opencensus-ext-azure (AzureLogHandler), note that it is on the deprecation path. Microsoft recommends migrating to the Azure Monitor OpenTelemetry Distro (Option 2 above).

Docs:
- Databricks Python SDK logging: https://databricks-sdk-py.readthedocs.io/en/latest/logging.html
- Databricks SDK for Python: https://docs.databricks.com/en/dev-tools/sdk-python.html

Hope this helps! Let me know which approach best fits your use case and I can go deeper on any of them.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

View solution in original post

SteveOstrowski · ‎03-07-2026