cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
Vicky_Bukta_DB
Databricks Employee
Databricks Employee

Zerobus Ingest now supports Databricks Variant type via REST API (Beta), enabling schema-free JSON ingestion. No more schema definitions, no more ETL headaches—just send your data and query it.

The Schema Management Problem

If you've worked with data ingestion pipelines, you know the drill: define a schema, update it when your data structure changes, redeploy your pipeline, and hope nothing breaks. This cycle becomes especially painful when ingesting semi-structured data from APIs, logs, or IoT devices, where data schema evolution is frequent.

Traditional approaches force you to choose between:

  • Rigid schemas that break when data structures change
  • String/JSON columns that sacrifice query performance and type safety
  • Complex ETL to normalize everything into fixed tables

Variant-type-blog-post.png


Enter Variant Type Support in Zerobus Ingest

Databricks Variant type provides a native way to store and query semi-structured data without predefined schemas. With Zerobus Ingest's REST API support for Variant, you can now ingest JSON directly while maintaining query-performant reads.

Three Key Benefits

  1. Limited Schema Definitions Required: Send your JSON data as-is. Add new fields tomorrow without updating schemas or redeploying pipelines.
  2. Native Performance: Unlike storing JSON as strings, Variant types are stored in an optimized format, where “Predictive Optimization” can apply shredding to enhance read performance. 
  3. Simplified Pipelines: Reduce your ETL code significantly. No more:
    1. Manual schema inference logic
    2. JSON parsing and flattening steps
    3. Schema evolution tracking
    4. Data type conversion errors

The Zerobus Ingest REST API endpoint becomes your ingestion pipeline. Focus on business logic instead of data plumbing.

Getting Started

To enable Variant support in your Zerobus Ingest workflow:

  1. Create a target table with a Variant column
  2. Start ingesting JSON data without schema definitions

Step 1: Creating Your Target Table

Before ingesting data, create a target table with a Variant column. Zerobus Ingest requires the table to exist beforehand—it will not auto-create tables for you.

Here's a simple table definition:

CREATE TABLE main.default.events (
  event_id STRING,
  data VARIANT,
  ingested_at TIMESTAMP
);

This minimal schema gives you:

  • An identifier field for tracking events.
  • A VARIANT column to store your entire JSON payload.
  • A timestamp for ingestion tracking.

You can also use a single-column approach if you don't need additional metadata:

CREATE TABLE main.default.events (
  data VARIANT
);

The beauty of Variant is that all your JSON structure lives in that one column, queryable without further schema definitions.

Step 2: Ingest Data

Using the Zerobus Ingest REST API with Variant is straightforward. Here's a simple example:

curl -X POST https://<databricks-workspace>.cloud.databricks.com/zerobus/v1/tables/main.default.events/insert \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '[
      {
        "event_id": "evt_123",
        "data": {
          "id": "usr_456",
          "email": "user@example.com",
          "preferences": {
            "notifications": true,
            "theme": "dark"
          }
        },
        "timestamp": "2026-02-13T10:30:00Z",
      },
      ...
    ]'

Once ingested, query your data with standard SQL and Variant shorthand SQL:

SELECT
  event_id,
  data:id as user_id,
  data:preferences.theme as theme,
  data:email as email
FROM events
WHERE data:preferences.notifications = true;

Real-World Use Cases

  • API Data Lakes: Ingest webhook payloads without mapping every possible field upfront.
  • Log Aggregation: Collect application logs with varying structures from different services in a single table.
  • IoT Telemetry: Store sensor data where device capabilities and metrics evolve over time.

Conclusion

Zerobus Ingest with Variant type support removes the friction from semi-structured data ingestion. By combining the flexibility of schema-free JSON with the performance of native Databricks storage, you can build more resilient data pipelines with less code and less maintenance for maximum flexibility.

Ready to simplify your data ingestion? Check out the Zerobus Ingest documentation and start sending data today.

Have questions or want to share your Variant use cases? Join the discussion below!

2 Comments
wesleyfelipe
Contributor

This really a great feature!
Having that option years ago would have made my life so much easier.
Thank for sharing.

I see from the documentation:

Shredding improves the query performance of VARIANT data by storing commonly occurring fields as separate columns in the Parquet files. This process reduces the I/O required to read fields and improves compression using a columnar format instead of a binary blob.

Is there any known limit on how many fields are shreded? Also, what is the logic that define 'commonly occurring fields'?

atomic
New Contributor II

@wesleyfelipe - thanks for the comment.

We don't publicly state the max field limit because it's subject to change, but think about 100+.

The *commonly occurring fields* decision is a heuristic based on write sampling, historical write data, and query history. Ie, whether the given value is common in a current batch, whether it is historically common for a given column, and whether it is frequently used in filters.