cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
Vicky_Bukta_DB
Databricks Employee
Databricks Employee

If you work in infrastructure or data engineering, there is a good chance syslog-ng is already somewhere in your stack. It is one of the most widely deployed open source log management tools in the world — battle-tested across Linux servers, network devices, IoT fleets, and enterprise environments for decades. It is good at what it does: collecting logs from almost any source, parsing and transforming them in flight, and reliably forwarding them to wherever they need to go.

Until recently, "wherever they need to go" rarely meant your data lakehouse — not directly, at least. Getting syslog-ng data into Databricks required extra infrastructure: writing to Kafka, dual-writing to object storage and then ingesting with Auto Loader, or building a custom pipeline. That changes with the native OpenTelemetry (OTLP) support in Zerobus Ingest. Read this companion blog for more details about the Beta launch of Zerobus Ingest OTEL.

What is syslog-ng?

syslog-ng (syslog new generation) is an open source log management daemon that extends the classic syslog protocol with content-based filtering, rich parsing, flexible routing, and reliable delivery. It supports a wide range of input formats — BSD syslog (RFC 3164), enhanced syslog (RFC 5424), JSON, journald, and more — and can forward to an equally wide range of destinations.

A few things that make syslog-ng particularly well-suited to production environments:

  • High throughput — handles millions of messages per second with tunable batching and multi-worker support.
  • Reliable delivery — disk-based buffering ensures messages are not lost if a downstream destination is temporarily unavailable.
  • Flexible transformation — rewrite rules, parsers, and Python-based plugins let you reshape messages in flight before they leave the pipeline.
  • Native OpenTelemetry support — since version 4.3, syslog-ng can emit OTLP/gRPC output natively. Version 4.6 added significant performance improvements, including batching and multiple workers.
  • Automatic OAuth2 token management — the cloud-auth() block handles token fetching and renewal for cloud destinations, with no manual intervention required.

syslog-ng use cases

Syslog-ng is used across a broad range of operational scenarios. Here are the most common ones — and why having this data in your lakehouse long-term matters for each.

  • Centralized log aggregation: Collect logs from dozens or hundreds of servers, network devices, and applications into a single pipeline. With Zerobus Ingest as a destination, every log from every source lands in Delta — queryable by your data teams, retainable for as long as you need.
  • Security and audit logging: Authentication events, privilege escalation, network access attempts, and system changes become valuable months after the fact during incident investigations or compliance audits. syslog-ng's content-based filtering lets you route security-relevant events to dedicated tables.
  • Infrastructure and network device monitoring: Routers, switches, firewalls, and load balancers all speak syslog. Landing this data in the lakehouse means you can correlate infrastructure events with application performance and build capacity models beyond a vendor's retention window.
  • Application log forwarding: Applications write to syslog via standard OS interfaces and syslog-ng picks those messages up with zero changes to application code, forwarding them via OTLP directly to your lakehouse.
  • IoT and edge device telemetry: syslog-ng can aggregate telemetry from large fleets of devices — with disk buffering to handle intermittent connectivity — and forward to Zerobus for centralized long-term storage and analysis.

How Zerobus Ingest OTEL enables this

Zerobus Ingest now includes a native OpenTelemetry Protocol (OTLP) endpoint that implements the standard OTLP/gRPC Collector service, with Zerobus Ingest OTEL (Beta). Syslog-ng's opentelemetry() destination serializes log records as OTLP/gRPC Protobuf messages and sends them directly to it, with Zerobus Ingest's x-databricks-zerobus-table-name header routing each batch into the correct Unity Catalog Delta table.

This is what the pipeline looks like:

syslog-image.png

No Kafka. No intermediary pipeline. No file-based ingestion job. syslog-ng handles the transformation and Zerobus handles the write.

Step-by-step: wire up syslog-ng to Zerobus Ingest

The following steps walk through the full setup. We used a simple Flask health monitor app as the log source — you can find the complete example in the companion repository.

Prerequisites

  • syslog-ng 4.11 or higherinstallation instructions
  • Python 3.9+ and Flask (pip install flask)
  • A Databricks workspace with Unity Catalog enabled
  • A service principal with OAuth M2M credentials — setup guide

Step 1 — Create the target Delta table

Zerobus does not auto-create tables. Run the following in Databricks SQL before starting the pipeline. Replace the placeholders with your catalog, schema, table name, and service principal Application ID.

CREATE TABLE <catalog>.<schema>.<table> (
  record_id STRING,
  time TIMESTAMP,
  date DATE,
  service_name STRING,
  event_name STRING,
  trace_id STRING,
  span_id STRING,
  time_unix_nano LONG,
  observed_time_unix_nano LONG,
  severity_number STRING,
  severity_text STRING,
  body VARIANT,
  attributes VARIANT,
  dropped_attributes_count INT,
  flags INT,
  resource STRUCT<
    attributes: VARIANT,
    dropped_attributes_count: INT
  >,
  resource_schema_url STRING,
  instrumentation_scope STRUCT<
    name: STRING,
    version: STRING,
    attributes: VARIANT,
    dropped_attributes_count: INT
  >,
  log_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name)
TBLPROPERTIES (
  'otel.schemaVersion'                     = 'v2',
  'delta.checkpointPolicy'                 = 'classic',
  'delta.feature.variantType-preview'      = 'supported'
);

GRANT USE CATALOG ON CATALOG <catalog> TO `<service-principal-uuid>`;
GRANT USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<table> TO `<service-principal-uuid>`;

Note: Querying VARIANT columns requires Databricks Runtime 15.3 or higher.

Step 2 — Configure syslog-ng

Create a syslog-ng.conf with three sections: a source, a rewrite rule that maps syslog fields to OTel attributes, and the Zerobus destination.

Source — listen on a Unix socket for messages from the Flask app:

source s_flask {
    unix-dgram("/tmp/flask-app.sock"
        keep-hostname(yes)
    );
};

Rewrite — map standard syslog macros to OpenTelemetry log attribute paths:

rewrite r_to_otlp {
    set("log"                               value(".otel.type"));
    set("$(* $S_UNIXTIME 1000000000)"       value(".otel.log.time_unix_nano"));
    set("$(* $R_UNIXTIME 1000000000)"       value(".otel.log.observed_time_unix_nano"));
    set("$LEVEL_NUM"                        value(".otel.log.severity_number"));
    set("$(uppercase $LEVEL)"               value(".otel.log.severity_text"));
    set("$MESSAGE"                          value(".otel.log.body"));
    set("$HOST"                             value(".otel.log.attributes.host.name"));
    set("$PROGRAM"                          value(".otel.resource.attributes.service.name"));
    set("$PID"                              value(".otel.log.attributes.process.pid"));
    set("$FACILITY"                         value(".otel.log.attributes.syslog.facility"));
    set("$FACILITY_NUM"                     value(".otel.log.attributes.syslog.facility.code"));
    set("$PRI"                              value(".otel.log.attributes.syslog.priority"));
    set("syslog.$FACILITY.$PROGRAM"         value(".otel.log.attributes.event.name"));
    set("syslog-ng-otel-exporter"           value(".otel.scope.name"));
};

Destination — send to Zerobus Ingest via OTLP/gRPC with automatic OAuth2 token management:

destination d_zerobus_logs {
    opentelemetry(
        url("<workspace_id>.zerobus.<region>.cloud.databricks.com:443")
        persist-name("zerobus_logs")
        headers(
            "x-databricks-zerobus-table-name" => "<catalog>.<schema>.<table>"
        )
        cloud-auth(
            oauth2(
                client_id("<service_principal_id>")
                client_secret("<service_principal_secret>")
                token_url("https://<workspace_url>/oidc/v1/token")
                scope("all-apis")
                resource("api://databricks/workspaces/<workspace_id>/zerobusDirectWriteApi")
                authorization_details('[
                    {"type":"unity_catalog_privileges","object_type":"CATALOG","object_full_path":"<catalog>","privileges":["USE CATALOG"]},
                    {"type":"unity_catalog_privileges","object_type":"SCHEMA","object_full_path":"<catalog>.<schema>","privileges":["USE SCHEMA"]},
                    {"type":"unity_catalog_privileges","object_type":"TABLE","object_full_path":"<catalog>.<schema>.<table>","privileges":["SELECT","MODIFY"]}
                ]')
                auth_method(basic)
            )
        )
        auth(tls())
        workers(4)
        batch-lines(50)
    );
};

log {
    source(s_flask);
    rewrite(r_to_otlp);
    destination(d_zerobus_logs);
};

One thing worth calling out: cloud-auth(oauth2(...)) handles token fetching and automatic renewal — long-running instances never need manual token rotation.

Step 3 — Start syslog-ng

syslog-ng -Fevd --cfgfile=syslog-ng.conf

The -F flag keeps it in the foreground. syslog-ng will create the Unix socket at /tmp/flask-app.sock and begin listening.

⚠️ Note: Start syslog-ng before the Flask app. If the app starts first, SysLogHandler connects lazily — the failure won't surface as a startup error, but every log message will silently print --- Logging error --- to stderr and nothing will reach Zerobus Ingest.

Step 4 — Start the Flask app

The example app is a health monitor with three endpoints and a background thread that emits health logs every 10 seconds. It uses Python's built-in SysLogHandler to write to the Unix socket syslog-ng is listening on — no additional libraries required.

pip install flask
python app.py

The app starts on http://localhost:5000.

Step 5 — Generate log events

Health check (INFO):

curl http://localhost:5000/health

Metrics snapshot (INFO):

curl http://localhost:5000/metrics

Simulate a failure (ERROR + WARN):

curl -X POST http://localhost:5000/simulate/error

Hit /simulate/error a few times to generate a burst of error-level events. The background thread will also emit occasional latency_degradation_detected warnings automatically.

Step 6 — Query your logs in Databricks SQL

-- Error and warning events only
SELECT time, severity_text, body::string AS message
FROM <catalog>.<schema>.<table>
WHERE severity_text IN ('ERROR', 'WARNING')
  AND time > current_timestamp() - INTERVAL 1 HOUR
ORDER BY time DESC;

-- Log volume by severity over time
SELECT
  date_trunc('minute', time) AS minute,
  severity_text,
  COUNT(*) AS log_count
FROM <catalog>.<schema>.<table>
WHERE time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY 1, 2
ORDER BY minute;

Vicky_Bukta_DB_1-1775758892294.png

For the full table schema and additional query examples, see the Zerobus Ingest OTEL documentation.

What you will get

Once the pipeline is running, your syslog-ng data is a first-class citizen in your lakehouse. You can:

  • Join log data with business tables — correlate application errors with customer impact, infrastructure events with product metrics.
  • Keep data for as long as you need at object storage cost, without a vendor retention cap.
  • Feed logs into downstream ML pipelines for anomaly detection or classification.
  • Apply Unity Catalog governance — row-level security, column masking, fine-grained access control — to your observability data.

Full example

The complete working example — Flask app, syslog-ng config, table schema, and step-by-step README — is available here.

Docs: Zerobus Ingest OTEL documentation

Have questions or feedback? Drop them in the comments.