Picture this: It's 3 AM. Twenty sailboats are racing through rough Caribbean seas in the middle of a four-day regatta. Each boat has sensors that transmit GPS coordinates, wind speed, course heading, and performance metrics every second. That's 1.7 million data points per day.
There are real-time demands: Race officials need to see who's leading right now. Spectators on shore want to watch their favorite boats navigate around marks in real-time. After the race, race teams want to understand which strategies worked best. Was it better strategically to tack or maintain speed over longer distances?
Notebook reviewing the performance of the boats based on distance sailed, wind conditions, and point of sail. The greater the velocity made good (VMG), the faster the boat is going to its destination.
Here's the traditional stack you'd need to support this challenge:
Each layer adds latency, operational complexity, and failure modes. You're looking at weeks of integration work before you write a single line of business logic.
But what if there was a better way? What if you could push data directly to your lakehouse, deploy interactive apps in minutes, and run analytics on the same platform—all fully managed with unified governance?
Here’s how we built exactly that using Zerobus Ingest, Databricks Asset Bundles (DABs), and Databricks Apps.
We built Data Drifter Regatta, a sailing simulator that demonstrates how Databricks’ modern data tools eliminate the complexity of real-time analytics. The application simulates a fleet of sailboats generating telemetry data (e.g., GPS coordinates, wind conditions, speed, VMG) and visualizes their race progress in real-time through an interactive web application.
The entire stack—data ingestion, storage, processing, analytics, and visualization—runs on Databricks, orchestrated by a single deployment command.
Let's walk through how three key Databricks tools made this possible.
Zerobus Ingest is a serverless, push-based ingestion API that writes data directly into Unity Catalog Delta tables. This endpoint is explicitly designed for high-throughput streaming writes to your lakehouse.
Zerobus Ingest is not a message bus. So, you don’t need to worry about Kafka, nor publishing to topics, scaling partitions, managing consumers, or scheduling backfills.
When your ultimate destination is the Lakehouse, Zerobus Ingest empowers you to eliminate the message bus, paving the way for a streamlined, single-hop architecture that dramatically reduces costs, complexity, and maintenance.
Zerobus Ingest handles all the complexity of streaming to lakehouse writes, without jobs to tune, message brokers to scale, or infrastructure to manage.
When you send data to Zerobus Ingest, you:
Zerobus Ingest handles the rest, including:
Zerobus Ingest is fundamentally different from traditional streaming platforms, where scaling means:
With Zerobus Ingest, your "scaling strategy" is to simply open more connections.
In this “Data Drifter Regatta” demo, we demonstrate a single connection pushing 20 records/second (all boats in one stream). You could easily expand this to scale for:
All these scenarios require zero infrastructure changes. Zerobus Ingest automatically scales to handle the incoming connections. You don't configure partitions, and you don't manage brokers. You simply push data. Joby Aviation, an early Zerobus Ingest adopter, transformed its data infrastructure—streaming gigabytes per minute from thousands of devices directly to the lakehouse, all with sub-five-second delivery. Read the case study.
Zerobus Ingest supports multiple ingestion protocols, giving you the flexibility to choose the right approach for your use case:
Zerobus Ingest’s multi-protocol approach provides the flexibility to support our diverse data producers.
Support for both gRPC and REST APIs in IoT and edge deployments offers flexibility for various device capabilities, with the REST API easily integrating with any device that can make HTTP requests. Both protocols ensure reliable performance and automatic schema management, making them ideal for resource-constrained devices and legacy systems.
Choose what fits your architecture and device constraints.
Let's look at the code we used to ingest sailboat telemetry:
from zerobus.sdk.aio import ZerobusSdk
from zerobus.sdk.shared import StreamDestination
# Initialize SDK with OAuth service principal
sdk = ZerobusSdk(
server_endpoint="your-endpoint.zerobus.databricks.com",
client_id=OAUTH_CLIENT_ID,
client_secret=OAUTH_CLIENT_SECRET
)
# Create a stream targeting your Unity Catalog table
stream = await sdk.create_stream(
destination=StreamDestination(
table_name="main.sailboat.telemetry"
)
)
# Generate telemetry data for each boat
for boat in fleet.boats:
telemetry = {
"boat_id": boat.id,
"boat_name": boat.name,
"timestamp": int(time.time() * 1000000), # microseconds
"latitude": boat.latitude,
"longitude": boat.longitude,
"speed_over_ground_knots": boat.speed,
"wind_speed_knots": wind.speed,
"heading_degrees": boat.heading,
# ... 20+ more fields
}
# Push to Zerobus - that's it!
await stream.ingest_record(telemetry)
That's the complete ingestion pipeline: 29 fields of nested telemetry data, streaming at 20 records/second, landing in Delta Lake—with only 3 function calls!
For this Data Drifter demo, we added a weather station that simulates an IoT device, reporting wind conditions via the Zerobus Ingest REST API. The weather station only emits data when conditions change significantly—demonstrating how smart devices can implement their own logic before sending data:
import requests
# Weather station prepares data from sensors
weather_data = [{
"station_id": "race-course-01",
"timestamp": int(time.time() * 1_000_000),
"wind_speed_knots": 15.3,
"wind_direction_degrees": 45.0,
"event_type": "frontal_passage",
"in_transition": True,
# ... additional weather metrics
}]
# Send to Zerobus REST API
response = requests.post(
f"https://{endpoint}/zerobus/v1/tables/{table}/insert",
json=weather_data,
headers={
"Authorization": f"Bearer {oauth_token}"
}
)
Behind a simple API call, Zerobus Ingest is doing the heavy lifting and handling key operational tasks automatically:
1. Schema Management
2. Optimization
3. Reliability
4. Security
All of this is made possible by being integrated into the Databricks platform, with Unity Catalog at the center.
Here’s the results of our testing with Data Drifter Regatta:
|
Metric |
Value |
|---|---|
|
Records per second |
20 (one per boat) |
|
Average latency |
< 500ms |
|
Data size |
~2KB per record (29 fields) |
|
Throughput |
~40KB/sec continuous |
|
Infrastructure managed |
0 servers, 0 clusters, 0 configs |
|
Code complexity |
3 function calls |
When we scaled to 100 boats (100 records/second), latency stayed flat. When we added batch ingestion (100 records per API call), throughput increased 10x with the same code.
|
What we did: |
What we didn't have to do: |
|
✅ Call ingest_record() with a Python dict ✅ Write SQL queries against Unity Catalog |
❌ Set up Kafka brokers ❌ Configure partitions and replication ❌ Write schema registry definitions ❌ Build Spark Streaming jobs ❌ Tune batch intervals ❌ Monitor consumer lag ❌ Handle small file compaction ❌ Set up dead letter queues ❌ Configure backpressure handling |
The difference is transformative. We spent our time building race visualization and analytics, not debugging streaming infrastructure.
For our sailboat use case, Zerobus Ingest allowed us to focus on the interesting problem—simulating realistic racing dynamics—rather than managing streaming infrastructure. That's the promise of serverless done right.
Once we had streaming data landing in our Delta tables, we needed a way to visualize it in real-time. We leveraged Databricks Apps to query our incoming data to create a race visualization and leaderboard.
Databricks Apps let you deploy interactive web applications directly in your Databricks workspace. The app runs on managed compute with zero infrastructure setup. Apps work as part of the unified Databricks platform, making it seamless to connect and access key assets such as our storage and SQL compute.
Databricks Apps can utilize your SQL compute to run queries against your storage, leverage a single copy of data.
We built our race tracker using Streamlit, with an interactive map powered by Folium:
# Query Unity Catalog directly
query = """
SELECT boat_id, boat_name, latitude, longitude,
speed_over_ground_knots, heading_degrees,
distance_to_destination_nm, race_status
FROM main.default.sailboat_telemetry
WHERE timestamp > (
SELECT MAX(timestamp) - 300000000
FROM main.default.sailboat_telemetry
)
ORDER BY timestamp DESC
"""
# Execute with SQL warehouse
cursor = conn.cursor()
cursor.execute(query)
df = cursor.fetch_all_arrow().to_pandas()
# Visualize on interactive map
for _, boat in df.groupby('boat_id').last().iterrows():
folium.Marker(
[boat['latitude'], boat['longitude']],
popup=f"{boat['boat_name']}: {boat['speed_over_ground_knots']:.1f} knots",
icon=folium.plugins.BoatMarker(rotation=boat['heading_degrees'])
).add_to(race_map)
When you create an app, it automatically provisions a service principal. You can then grant that principal access to specific resources:
{
"resources": [
{
"name": "sql-warehouse",
"sql_warehouse": {
"id": "warehouse-id",
"permission": "CAN_USE"
}
}
]
}
The app's service principal gets automatic permissions to:
No API keys, no credential management, no security headaches. Prototyping with Databricks Apps was a breeze. In summary,
Here is the following challenge: post-race analysis. How do we coordinate the deployment of our pipeline and jobs? We want to understand the performance of the different boats and their progress through the races, all in the context of changing weather conditions.
Notebooks report reviewing the changing wind conditions over the course of the race.
Databricks Asset Bundles let you define your Databricks project, such as jobs, notebooks, and resources, all in a single YAML file and deploy it with one command.
Using Bundles, we can achieve:
Your project's databricks.yml file defines Data Drifter Application Bundles. In our case, this file sets up a job with a sequence of tasks designed to analyze the race results.
bundle:
name: sailboat_telemetry_analysis
variables:
config_file_path:
default: ../config.toml
warehouse_id:
default: "warehouse-id-here"
resources:
jobs:
sailboat_analysis:
name: "Sailboat Performance Analysis"
tasks:
- task_key: boat_performance
notebook_task:
notebook_path: ./notebooks/01_boat_performance.py
source: WORKSPACE
- task_key: wind_conditions
notebook_task:
notebook_path: ./notebooks/02_wind_conditions.py
source: WORKSPACE
depends_on:
- task_key: boat_performance
- task_key: race_progress
notebook_task:
notebook_path: ./notebooks/03_race_progress.py
source: WORKSPACE
depends_on:
- task_key: boat_performance
- task_key: race_summary
notebook_task:
notebook_path: ./notebooks/04_race_summary.py
source: WORKSPACE
depends_on:
- task_key: wind_conditions
- task_key: race_progress
targets:
dev:
mode: development
workspace:
host: https://your-workspace.cloud.databricks.com
prod:
mode: production
workspace:
host: https://prod-workspace.cloud.databricks.com
The result is a pipeline that looks like this, with a chain of dependencies that run before compiling everything into a final race summary.
The Data Drifter Regatta architecture uses Zerobus Ingest to push real-time sailboat and weather station data to Delta Lake tables.
This Delta open format storage layer then powers downstream processes: a Streamlit app for live race visuals, and Databricks Notebooks for post-race analysis orchestrated through DABs.
Its simplicity enables data producers to write to Delta Lake once, allowing multiple consumers—such as a real-time UI and batch analytics—to access the same tables without separate pipelines or complex orchestration.
The Databricks Platform made building Data Drifter simple and streamlined, eliminating the need to integrate with multiple vendors or self-host open source components.
Building Data Drifter Regatta taught us that platform unification enables application breadth.
A unified platform means:
Across industries, converging trends underscore the need for a unified platform to accelerate development.
The old approach—best-of-breed tools stitched together by dedicated engineering and infrastructure teams—was a liability. Now, a unified, simplified platform lets you focus on business logic—your market differentiator—rather than managing infrastructure.
One platform, zero manual integration, infinite possibilities.
The complete Data Drifter Regatta code is available as a reference implementation. We combined our telemetry generator, Databricks App, and analysis pipeline into a simple deployment script to get you started.
# 1. Clone the repository
git clone databricks-solutions/zerobus-ingest-examples
cd data_drifter
# 2. Configure (just update config.toml with your credentials)
# 3. Deploy everything with one command
./deploy.sh
# 4. Start generating telemetry
python main.py --client-id <client-id> --client-secret your-secret
Try it yourself: Clone the complete Data Drifter Regatta code and see how quickly you can go from "I have streaming data" to "I have a production application."
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.