cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Safe Update Strategy for Online Feature Store Without Endpoint Disruption

VivekWV
New Contributor II

Hi Team,

We are implementing Databricks Online Feature Store using Lakebase architecture and have run into some constraints during development:

Requirements:

  1. Deploy an offline table as a synced online table and create a feature spec that queries from this online table.
  2. During development, schema changes occur frequently (columns renamed or removed).
  3. After schema changes, we need to redeploy the endpoint with the updated online table and feature spec.

Problem: When an endpoint is running and we delete/recreate the online table and feature spec (to reflect schema changes), the endpoint breaks. In some cases, it even becomes irrecoverable.

Constraints:

  • Cannot create two online tables for the same offline table.
  • Deleting and recreating binding resources (online table + feature spec) disrupts the endpoint.
  • We need to keep a stable endpoint URL for consumers (cannot create multiple shadow endpoints).

Question: What is the recommended approach to safely update the online store and feature spec without causing downtime or breaking the endpoint? Is there a supported pattern for atomic updates or versioning in Databricks Feature Store?

Thanks for your guidance!
#lakehouse #databricksonlinefeaturestore #syncedtable #postgres #onlinefeaturestore

3 REPLIES 3

VivekWV
New Contributor II

Additional Context:

  • The feature spec created from the synced table is served through an endpoint, and we need to keep the same endpoint URL for consumers.
  • After schema changes, we currently recreate the synced table and feature spec with the same names before updating the endpoint.
  • Even after updates, the endpoint sometimes breaks or becomes irrecoverable.
  • We have steps in place to clean up the Postgres datastore during synced table deletions, so the issue is not with leftover data but with the binding between the endpoint and feature spec.

mark_ott
Databricks Employee
Databricks Employee

The recommended way to safely update an online Databricks Feature Store without breaking the serving endpoint or causing downtime involves a version-controlled, atomic update pattern that preserves schema consistency and endpoint stability.

Key Issue

When an online feature table is deleted and recreated due to a schema change, the associated endpoint and feature spec lose binding references, rendering the endpoint unstable. Databricks currently does not support true in-place schema replacement for synced online tables — any schema change to the offline Delta source requires synchronization through a publish or merge update, not recreation.​

Recommended Approach

1. Use Incremental Schema Evolution

Databricks Delta Tables support schema evolution, allowing columns to be added or updated without deleting the table. You can use:​

python
fs.write_table( name="catalog.schema.feature_table", df=new_feature_df, mode="merge" # merges updates safely )

This approach updates the schema and data without breaking existing bindings between the offline and online tables.​

2. Republish or Refresh Features Atomically

Instead of deleting the online table, use:

python
fe.publish_table( source_table_name="catalog.schema.feature_table", online_table_name="catalog.schema.online_table", online_store=online_store, mode="merge" )

mode="merge" ensures the online table schema and data are updated incrementally while keeping its identity (and thus the endpoint bindings) intact. This prevents downtime and maintains endpoint stability.​

3. Use Lakeflow Jobs for Continuous Sync

If schema changes or feature updates are frequent, schedule Lakeflow Jobs to regularly call publish_table. This approach makes the feature update process continuous and fault-tolerant without manual deletion or recreation.​

4. Maintain Versioned Feature Specs

Databricks recommends maintaining versioned feature specifications (for example, feature_spec_v1, feature_spec_v2), while keeping a constant endpoint mapping. During deployment, update the endpoint’s configuration reference to the new spec version atomically. The endpoint name and URL remain unchanged.​

Practical Schema Evolution Workflow

  1. Update offline Delta table schema (enable CDF if not already set).

  2. Write or merge new features using schema evolution.

  3. Republish the updated offline table to the online store using mode="merge".

  4. Update the feature spec version — do not delete the online table.

  5. Redeploy endpoint referencing the new feature spec (same URL).

Summary Table

Problem Corrective Practice
Schema change causes endpoint breakage Use Delta schema evolution with mode="merge"
Need uninterrupted endpoint (stable URL) Reuse endpoint, only version feature spec
Frequent schema changes Use Lakeflow jobs for automated sync
Avoid dual tables for one offline source Use incremental publish_table to preserve online identity
 
 

This workflow ensures atomic updates, zero downtime, and endpoint continuity while enabling schema flexibility under Databricks’ Online Feature Store using Lakebase architecture.

VivekWV
New Contributor II

Hi Mark, 

Thanks for your response. I followed the steps you suggested:

  1. Created the table and set primary key + time series key constraints.
  2. Enabled Change Data Feed.
  3. Created the feature table and deployed the online endpoint — this worked fine.
  4. Removed some columns from the offline table and updated it using:
    spark_df.write.mode("overwrite").option("overwriteSchema", "true").saveAsTable(f"{table_name}")
  5. Updated the feature table using:
    fe.write_table(name=feature_store_name, df=df, mode="merge")
  6. Tried re-publishing to the online store using:
    fe.publish_table(source_table_name=feature_store_list, online_table_name="catlog_dev.abcd.fs_table_online", online_store=pg_store, mode="merge")
    — this step failed.

A LakeFlow pipeline was triggered and threw the following error:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] 
... [DELTA_SCHEMA_CHANGED_WITH_STARTING_OPTIONS] Detected schema change in version 7

It seems the schema change isn’t being handled during re-publication. I’ve attached the full error message. Let me know if you need more details or logs.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now