Zerobus Ingest is now generally available, enabling you to push data to managed tables in your lakehouse. Zerobus Ingest delivers near real-time data ingestion—within as little as five seconds—while supporting up to 100 MB per second throughput, over 10 GB per second table throughput, and highly concurrent workloads. It’s designed to handle thousands of clients writing to the same table simultaneously without compromising performance or reliability.
How Zerobus Ingest works
Zerobus Ingest changes event streaming architectures by writing directly to Delta tables using optimized Parquet files, bypassing traditional message bus layers.
​
Records sent to Zerobus Ingest will be buffered before landing data in the Table. Time to ACK is ~200ms, where time to Table is ~5 seconds.
Performance Characteristics
- Throughput: Up to 100 MB/sec per connection, with linear scaling across multiple connections.
- Concurrency: Supports thousands of concurrent clients writing to the same Delta table simultaneously.
- Observability: Native OpenTelemetry (OTEL) support for forwarding metrics and traces to your monitoring stack.
Best Practices for Production Deployments
When to use Zerobus Ingest instead of a message bus
The right architecture comes down to who needs the data—and how quickly. Review this guidance to determine the best path for your next real-time project.
Zerobus Ingest: The "Direct-to-Delta" Specialist
Zerobus Ingest uses a single-sink architecture. It is designed with one goal: getting event data into a Databricks-managed table as fast and as simply as possible. This direct path eliminates infrastructure overhead and reduces latency. Best for: Telemetry, IoT logs, and clickstream, where the Lakehouse is the primary source of truth for all downstream consumption.
Message Bus: The "Multi-Consumer" Hub
Traditional buses like Kafka or RabbitMQ use a multi-sink architecture. They act as a universal buffer for data consumed by multiple independent systems (e.g., search indices, fraud detection, and dashboards) at their own pace. Best for: Event-driven microservices, real-time alerting systems, and complex ecosystems.
Recommended Technical Guidelines for Production:
- Use Managed Delta Tables with Liquid Clustering: Enable Liquid Clustering on your target tables. This ensures that the high volume of files generated by real-time streaming is automatically organized for fast query performance without manual partitioning.
- Optimize for High Throughput and Concurrency: For continuous ingestion, maintain a single, open stream. Reusing the same stream minimizes latency and interruptions by avoiding the multi-second delay required to initialize a new connection.
- Respect Record and Request Limits:
- Maximum Message Size: Each individual message is limited to 10 MB.
- Throughput Limit: 100 MB/sec per connection; distribute traffic across multiple connections for higher workloads.
- Schema Enforcement: Zerobus Ingest will never auto-evolve your target table. Zerobus Ingest supports continuous ingestion as long as the records can fit into the target table. Any records missing nullable columns are accepted, while any records missing required columns will receive a failure notification.
- Ensure Multi-AZ Durability: GA currently supports single availability zone (single-AZ) durability, with multi-AZ support as a planned roadmap item.
The Schema Strategy: Strict Contracts vs. "Schema-on-Read"
One of the most powerful features of Zerobus Ingest is its strict enforcement of data quality. Unlike traditional buses that may allow "garbage in" to be dealt with later, Zerobus ensures that what lands in your Lakehouse is ready for immediate use.
However, how you design that enforcement depends on how much you trust your data sources.
Option A: The Strict Contract (Formal Tables)
If your downstream BI and ML models depend on high-fidelity data, you should define a Strict Schema. Zerobus will validate every incoming record against your table definition. If a record doesn't fit, Zerobus throws an error, preventing downstream data corruption.
- When to use: For mission-critical telemetry, financial transactions, or production logs.
- The Power of NOT NULL: Use Required Columns (NOT NULL) to enforce a strict contract. If a device fails to send a device_id or timestamp, the record is rejected at the front door.
- The Benefit: Your data is "Clean on Arrival." No more complex Spark jobs to filter out nulls or malformed rows.
Option B: The Flexible Payload (The VARIANT Type)
Sometimes you don't control the source, or the source sends a "blob" of metadata that changes weekly. For these scenarios, use the VARIANT column type.
- How it works: You define a table with fixed columns for your primary keys (like id and timestamp) and a single payload column of type VARIANT. You can then dump entire JSON payloads into this column.
- The Catch: While the schema is flexible, the format is not. Zerobus still parses the JSON to ensure it is valid. If the payload is not parsable JSON, Zerobus will return an error to the client.
- The Benefit: You get the "Schema-on-Read" flexibility of a Document Store with the performance of Delta Lake.
Pro-Tip: Most successful architectures use a Hybrid Approach. Define your "Core" fields (ID, Timestamp, User, Action, Version) as strict, required columns, and use a metadata VARIANT column for everything else. This gives you the best of both worlds: reliable partitioning and join keys, with the flexibility to evolve your data over time.
Refer to the resources below to learn more about Zerobus Ingest. Go forth and build!
- Try Zerobus Ingest Now: Access the documentation and quickstart guides.
- Take Product Tour: Navigate through Zerobus Ingest and learn how to get started ingesting data.
- Build an End-to-End Application: A real-time sailing simulator tracks a fleet of sailboats using Python SDK and the REST API, with Databricks Apps and Databricks Asset Bundles. Read the blog.
- Build a Digital Twins Solution: Learn how to maximize operational efficiency, accelerate real-time insight and predictive maintenance with Databricks Apps and Lakebase. Read the blog.
Have questions or want to share your Zerobus Ingest use cases with us? Join the discussion below!