If you work in infrastructure or data engineering, there is a good chance syslog-ng is already somewhere in your stack. It is one of the most widely deployed open source log management tools in the world — battle-tested across Linux servers, network devices, IoT fleets, and enterprise environments for decades. It is good at what it does: collecting logs from almost any source, parsing and transforming them in flight, and reliably forwarding them to wherever they need to go.
Until recently, "wherever they need to go" rarely meant your data lakehouse — not directly, at least. Getting syslog-ng data into Databricks required extra infrastructure: writing to Kafka, dual-writing to object storage and then ingesting with Auto Loader, or building a custom pipeline. That changes with the native OpenTelemetry (OTLP) support in Zerobus Ingest. Read this companion blog for more details about the Beta launch of Zerobus Ingest OTEL.
syslog-ng (syslog new generation) is an open source log management daemon that extends the classic syslog protocol with content-based filtering, rich parsing, flexible routing, and reliable delivery. It supports a wide range of input formats — BSD syslog (RFC 3164), enhanced syslog (RFC 5424), JSON, journald, and more — and can forward to an equally wide range of destinations.
A few things that make syslog-ng particularly well-suited to production environments:
Syslog-ng is used across a broad range of operational scenarios. Here are the most common ones — and why having this data in your lakehouse long-term matters for each.
Zerobus Ingest now includes a native OpenTelemetry Protocol (OTLP) endpoint that implements the standard OTLP/gRPC Collector service, with Zerobus Ingest OTEL (Beta). Syslog-ng's opentelemetry() destination serializes log records as OTLP/gRPC Protobuf messages and sends them directly to it, with Zerobus Ingest's x-databricks-zerobus-table-name header routing each batch into the correct Unity Catalog Delta table.
This is what the pipeline looks like:
No Kafka. No intermediary pipeline. No file-based ingestion job. syslog-ng handles the transformation and Zerobus handles the write.
The following steps walk through the full setup. We used a simple Flask health monitor app as the log source — you can find the complete example in the companion repository.
Zerobus does not auto-create tables. Run the following in Databricks SQL before starting the pipeline. Replace the placeholders with your catalog, schema, table name, and service principal Application ID.
CREATE TABLE <catalog>.<schema>.<table> (
record_id STRING,
time TIMESTAMP,
date DATE,
service_name STRING,
event_name STRING,
trace_id STRING,
span_id STRING,
time_unix_nano LONG,
observed_time_unix_nano LONG,
severity_number STRING,
severity_text STRING,
body VARIANT,
attributes VARIANT,
dropped_attributes_count INT,
flags INT,
resource STRUCT<
attributes: VARIANT,
dropped_attributes_count: INT
>,
resource_schema_url STRING,
instrumentation_scope STRUCT<
name: STRING,
version: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>,
log_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name)
TBLPROPERTIES (
'otel.schemaVersion' = 'v2',
'delta.checkpointPolicy' = 'classic',
'delta.feature.variantType-preview' = 'supported'
);
GRANT USE CATALOG ON CATALOG <catalog> TO `<service-principal-uuid>`;
GRANT USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<table> TO `<service-principal-uuid>`;
Note: Querying VARIANT columns requires Databricks Runtime 15.3 or higher.
Create a syslog-ng.conf with three sections: a source, a rewrite rule that maps syslog fields to OTel attributes, and the Zerobus destination.
Source — listen on a Unix socket for messages from the Flask app:
source s_flask {
unix-dgram("/tmp/flask-app.sock"
keep-hostname(yes)
);
};
Rewrite — map standard syslog macros to OpenTelemetry log attribute paths:
rewrite r_to_otlp {
set("log" value(".otel.type"));
set("$(* $S_UNIXTIME 1000000000)" value(".otel.log.time_unix_nano"));
set("$(* $R_UNIXTIME 1000000000)" value(".otel.log.observed_time_unix_nano"));
set("$LEVEL_NUM" value(".otel.log.severity_number"));
set("$(uppercase $LEVEL)" value(".otel.log.severity_text"));
set("$MESSAGE" value(".otel.log.body"));
set("$HOST" value(".otel.log.attributes.host.name"));
set("$PROGRAM" value(".otel.resource.attributes.service.name"));
set("$PID" value(".otel.log.attributes.process.pid"));
set("$FACILITY" value(".otel.log.attributes.syslog.facility"));
set("$FACILITY_NUM" value(".otel.log.attributes.syslog.facility.code"));
set("$PRI" value(".otel.log.attributes.syslog.priority"));
set("syslog.$FACILITY.$PROGRAM" value(".otel.log.attributes.event.name"));
set("syslog-ng-otel-exporter" value(".otel.scope.name"));
};
Destination — send to Zerobus Ingest via OTLP/gRPC with automatic OAuth2 token management:
destination d_zerobus_logs {
opentelemetry(
url("<workspace_id>.zerobus.<region>.cloud.databricks.com:443")
persist-name("zerobus_logs")
headers(
"x-databricks-zerobus-table-name" => "<catalog>.<schema>.<table>"
)
cloud-auth(
oauth2(
client_id("<service_principal_id>")
client_secret("<service_principal_secret>")
token_url("https://<workspace_url>/oidc/v1/token")
scope("all-apis")
resource("api://databricks/workspaces/<workspace_id>/zerobusDirectWriteApi")
authorization_details('[
{"type":"unity_catalog_privileges","object_type":"CATALOG","object_full_path":"<catalog>","privileges":["USE CATALOG"]},
{"type":"unity_catalog_privileges","object_type":"SCHEMA","object_full_path":"<catalog>.<schema>","privileges":["USE SCHEMA"]},
{"type":"unity_catalog_privileges","object_type":"TABLE","object_full_path":"<catalog>.<schema>.<table>","privileges":["SELECT","MODIFY"]}
]')
auth_method(basic)
)
)
auth(tls())
workers(4)
batch-lines(50)
);
};
log {
source(s_flask);
rewrite(r_to_otlp);
destination(d_zerobus_logs);
};
One thing worth calling out: cloud-auth(oauth2(...)) handles token fetching and automatic renewal — long-running instances never need manual token rotation.
syslog-ng -Fevd --cfgfile=syslog-ng.conf
The -F flag keeps it in the foreground. syslog-ng will create the Unix socket at /tmp/flask-app.sock and begin listening.
⚠️ Note: Start syslog-ng before the Flask app. If the app starts first, SysLogHandler connects lazily — the failure won't surface as a startup error, but every log message will silently print --- Logging error --- to stderr and nothing will reach Zerobus Ingest.
The example app is a health monitor with three endpoints and a background thread that emits health logs every 10 seconds. It uses Python's built-in SysLogHandler to write to the Unix socket syslog-ng is listening on — no additional libraries required.
pip install flask
python app.py
The app starts on http://localhost:5000.
Health check (INFO):
curl http://localhost:5000/health
Metrics snapshot (INFO):
curl http://localhost:5000/metrics
Simulate a failure (ERROR + WARN):
curl -X POST http://localhost:5000/simulate/error
Hit /simulate/error a few times to generate a burst of error-level events. The background thread will also emit occasional latency_degradation_detected warnings automatically.
-- Error and warning events only
SELECT time, severity_text, body::string AS message
FROM <catalog>.<schema>.<table>
WHERE severity_text IN ('ERROR', 'WARNING')
AND time > current_timestamp() - INTERVAL 1 HOUR
ORDER BY time DESC;
-- Log volume by severity over time
SELECT
date_trunc('minute', time) AS minute,
severity_text,
COUNT(*) AS log_count
FROM <catalog>.<schema>.<table>
WHERE time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY 1, 2
ORDER BY minute;
For the full table schema and additional query examples, see the Zerobus Ingest OTEL documentation.
Once the pipeline is running, your syslog-ng data is a first-class citizen in your lakehouse. You can:
The complete working example — Flask app, syslog-ng config, table schema, and step-by-step README — is available here.
Docs: Zerobus Ingest OTEL documentation
Have questions or feedback? Drop them in the comments.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.