TL;DR
ZeroBus Ingest is a serverless, Kafka-free ingestion service in Databricks that allows applications and IoT devices to stream data directly into Delta Lake with low latency and minimal operational overhead.
Real-time data ingestion is a core requirement for modern IoT and event-driven architectures. Traditionally, platforms like Apache Kafka have been used as an intermediary layer between producers and analytics systems, adding operational complexity and latency.
Zerobus Ingest is Databricks’ Kafka-free ingestion solution that allows applications and devices to write events directly into Delta tables with low latency and minimal infrastructure. In this article, we explore how Zerobus works, when to use it, and how to ingest real-time events step by step.
Zerobus connector Producers simply push data using a lightweight API, and Zerobus takes care of buffering, reliability, and scaling behind the scenes.
Traditional real-time data pipelines often rely on messaging systems like Kafka to move data from applications. While effective, this approach introduces several challenges, such as:
Zerobus solves and simplify the ingestion process by:
In practice, Zerobus Ingest is designed for event-driven and IoT ingestion use cases where Databricks is the primary analytics platform.
Modern data architectures increasingly favor simpler, event-first designs that minimize moving parts while preserving reliability and scale. Zerobus Ingest on Databricks supports this shift by reducing ingestion pipelines to their essential components and removing unnecessary infrastructure between event producers and Delta Lake storage. At the same time, the native integration with Unity Catalog ensures governance, security, and lineage are applied from the moment data is written.
In short, Zerobus is a strong fit when Databricks is the primary destination and the goal is Kafka-free streaming ingestion into Delta Lake.
Zerobus Ingest is implemented as a serverless ingestion layer within Databricks, exposed through gRPC and REST interfaces. Producers establish a streaming connection to the Zerobus endpoint and send events commonly serialised using Protocol Buffers directly to the target Delta table
Behind the scenes, Zerobus integrates natively with Delta Lake for durable, transactional storage and Unity Catalog for access control and governance.
Client SDKs in languages such as Python, Java, and Rust abstract the complexity of stream creation and record ingestion, allowing developers to implement scalable real-time pipelines with minimal configuration.
gRPC: A high-performance communication protocol used by Zerobus to stream events from producers to Databricks with low latency and reliability.
Protobuf: A compact, strongly typed data format that defines the event schema and efficiently serializes data before ingestion.
Client: Runs in the application or device, packages events using Protobuf, and sends them to Zerobus using the gRPC API.
Zerobus Server: A fully managed, serverless Databricks service that receives events, handles buffering and durability, and writes data directly into Delta tables.
Zerobus Ingest complements, rather than replaces, existing Databricks ingestion tools. Each ingestion option is designed for a different data source type, latency requirement, and operational model. Choosing the right ingestion pattern depends on whether data is produced as events, files, or database changes, as well as the level of real-time processing and infrastructure complexity required.
The following comparison summarises how Zerobus Ingest compares with Kafka, Auto Loader, and CDC pipelines in Databricks, and provides practical guidance on when each option is the best fit for real-time data ingestion workloads.
| Zerobus Ingest Best for: Direct real-time event ingestion Data source type: Applications, IoT devices, services Latency: Low (near real-time) Operational effort: Very low (serverless) When to choose: When events need to land directly in Delta tables with minimal infrastructure and Databricks is the primary destination |
Auto Loader Best for: Incremental file ingestion Data source type: Files in cloud storage Latency: Medium (micro-batch) Operational effort: Low When to choose: When data arrives as files or batches and near-real-time processing is not required |
| Kafka + Structured Streaming Best for: Large-scale event streaming and complex processing Data source type: Event streams via a message broker Latency: Low Operational effort: High When to choose: When multiple consumers, message retention, or advanced stream processing is required |
CDC Pipelines Best for: Database change data capture (CDC) Data source type: Transactional databases Latency: Medium to low Operational effort: Medium When to choose: When replicating database changes into the Lakehouse while maintaining row-level consistency |
Ingestion Method Decision Tree
Prerequisites & Environment Setup
Server (Workspace side)
Client Side :
Generating Protobuf (ignore this if you are using json)
In Zerobus:
The .proto file defines the schema of the events written to Delta tables.
Compiled client code (.py, .java, etc.) is used by producers to send data
This guarantees schema correctness, performance, and compatibility at ingestion time
Protobuf defines the data contract, and the compiled files provide the language-specific code needed to efficiently send and receive that data.
Get proto definition :
python -m zerobus.tools.generate_proto \
--uc-endpoint "$UC_ENDPOINT" \
--client-id "$CLIENT_ID" \
--client-secret "$CLIENT_SECRET" \
--table "$TABLE_NAME" \
--output "$OUTPUT_FILE"
this will generate XX.proto file which is Protocol Buffer definition. In next step we can use it to compile into language-specific proto
python -m grpc_tools.protoc --python_out=. --proto_path=. record.proto
this will generate python complied proto definition which can be used by python SDK to serialised the message.
Sending messages using SDK
load config : (give information to client about zero bus server)
Defining Configuration for Zerobus server
Create Stream : (open stream with zerobus server using config)
opening zerobus stream
Send Records to Server : (Async or Sync )
Send messages to Zerobus
The Final wrapper : (calling all together )
main runner function
Logging from Zerobus gives clear chain how it progressed we can also see some metric
we can see ingested data in delta table :
once data loaded in target table it can be processed using Spark declarative pipelines in continuous mode to process changes to downstreams.
Example Zerobus Ingest client implementation on GitHub:-
Zerobus Ingest is currently in public preview and has defined throughput limits, with optimal performance when the client and endpoint run in the same region. It supports up to 100 MB/s or ~15,000 rows per second per stream and provides at-least-once delivery guarantees, requiring downstream handling of potential duplicates. While Zerobus usage is free during preview, Databricks plans to introduce pricing in the future, which should be considered for production planning.
Zerobus Ingest simplifies real-time data ingestion in Databricks by allowing events to be written directly into Delta tables without relying on a traditional message broker. This approach reduces operational complexity, minimises latency, and enables faster analytics for event-driven and IoT workloads.
While Zerobus is ideal for direct event ingestion with Databricks as the primary destination, other patterns such as Kafka, Auto Loader, or CDC pipelines remain better suited for complex stream processing, file-based ingestion, or database replication. Selecting the right ingestion pattern depends on workload requirements, latency needs, and operational considerations.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.