02-04-2026 04:22 AM
If you've been scratching your head at Lakebase's "branching" feature wondering "am I working with a database or GitHub?"—you're not alone. Let me break down what's actually happening here, because once it clicks, it changes how you think about database development entirely.
Before we dive in, make sure you've got:
us-east-1, us-east-2, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, or ap-southeast-2Here's the thing that confused me initially: Lakebase isn't Delta Lake. It's a fully managed PostgreSQL database. So when we talk about "branching," we're not talking about Delta's transaction log or time travel—we're talking about something fundamentally different.
When you create a Lakebase project, you automatically get two branches: production (your root/default branch) and development (a child of production). From there, you can create child branches from any existing branch, building out a hierarchy that looks eerily familiar to anyone who's used Git:
production (root - protected)
├── staging
│ └── feature-payments
└── development
├── dev-alice
└── dev-bob
Each branch is a fully independent PostgreSQL database environment. Changes you make in dev-alice don't touch dev-bob, and neither affects production. It's complete isolation—but without the hours of data copying you'd normally need.
Okay, so how does a 200GB database branch in 3 seconds? The answer is copy-on-write storage, and it's actually pretty elegant.
When you create a branch, Lakebase doesn't duplicate your data. Instead, the new branch just gets pointers to the same underlying storage as its parent. Think of it like Git's branching—the branch itself is essentially free because it's just referencing the same data.
BEFORE ANY CHANGES:
production branch dev branch (just created)
┌──────────────┐ ┌──────────────┐
│ users table │◄─────│ → pointer │
│ orders table │◄─────│ → pointer │
│ products │◄─────│ → pointer │
└──────────────┘ └──────────────┘
(actual data) (no storage cost yet)
AFTER MODIFYING users TABLE IN dev:
production branch dev branch
┌──────────────┐ ┌──────────────┐
│ users table │ │ users table' │ ← only this is new storage
│ orders table │◄─────│ → pointer │
│ products │◄─────│ → pointer │
└──────────────┘ └──────────────┘
The implications are huge:
You can create branches through the Lakebase App UI, Python SDK, Java SDK, CLI, or REST API. Here's the UI workflow:
dev/yourname for personal branches)That's it. A few seconds later, you've got a complete branch with its own compute endpoint and connection string.
If you prefer code, here's the Python SDK approach:
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Branch, BranchSpec, Duration
w = WorkspaceClient()
# Create a branch with 7-day expiration
branch_spec = BranchSpec(
ttl=Duration(seconds=604800), # 7 days in seconds
source_branch="projects/my-project/branches/production"
)
branch = Branch(spec=branch_spec)
result = w.postgres.create_branch(
parent="projects/my-project",
branch=branch,
branch_id="dev-alice"
).wait()
print(f"Branch created: {result.name}")
Or via CLI:
databricks postgres create-branch projects/my-project dev-alice \
--json '{
"spec": {
"source_branch": "projects/my-project/branches/development",
"ttl": "604800s"
}
}'
Pro tip: You can set branches to auto-expire. UI presets are 1 hour, 1 day, or 7 days. Via API, you can set any TTL up to 30 days max—perfect for CI/CD test branches that should clean themselves up.
Here's where it feels normal again. Once you connect to your branch (each branch has its own connection string), you're just running standard PostgreSQL:
-- Create a new table for your feature
CREATE TABLE user_preferences (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
theme VARCHAR(50) DEFAULT 'light',
notifications BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW()
);
-- Add a column to an existing table
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;
-- Create an index
CREATE INDEX idx_users_last_login ON users(last_login);
-- Insert test data
INSERT INTO user_preferences (user_id, theme)
VALUES (1, 'dark'), (2, 'light'), (3, 'dark');
No special syntax. No branch-aware commands. Just write your migrations like you always have.
Alright, here's the part that'll trip you up if you're expecting full Git semantics: Lakebase doesn't have native merge functionality. You can't just click a button to promote changes from dev-alice back to development.
The docs are clear: "To move changes from child to parent, use your standard migration tools."
So what's the actual workflow? It looks like this:
It's not as seamless as Git merge, but honestly? It forces good migration hygiene. You can't get lazy and just "merge and hope"—you have to actually track your schema changes properly.
This feature partially makes up for the lack of merge. Schema Diff lets you compare the DDL between any two branches, showing exactly what's different.
To use it:
You'll get a side-by-side view: red for removed/changed from base, green for added/changed in your branch. It captures tables, columns, constraints, indexes—all your schema objects.
I've started using this before EVERY migration promotion. It's caught a few "oops, I didn't mean to add that column" moments.
Branch reset instantly updates a child branch to match its parent's current state. Key word: one-way. Parent to child only.
When would you use this?
A few gotchas here:
production) can't be reset since they have no parentThis is one of my favorite features. You can create a branch from any point within your restore window (configurable from 2 to 35 days).
Real scenario: Someone ran a DELETE FROM orders WHERE status = 'pending' without a WHERE order_date < ... clause. Poof—three days of orders gone.
With point-in-time branching:
No calling support. No restoring from backups. Just branch, query, fix.
After a few weeks of using this, here's the pattern that's clicked for my team:
production (protected)
└── development
├── dev/alice
├── dev/bob
└── dev/charlie
Each dev gets their own long-lived branch off development. We reset them weekly to stay reasonably current with shared changes.
The workflow:
development to start freshdevelopment to review changesdevelopment using our standard Alembic workflowdevelopment with the teamproductionFor CI/CD, we spin up ephemeral branches with short expiration:
import uuid
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Branch, BranchSpec, Duration
w = WorkspaceClient()
branch_id = f"ci-{uuid.uuid4().hex[:8]}"
# Create 2-hour ephemeral branch
branch_spec = BranchSpec(
ttl=Duration(seconds=7200), # 2 hours
source_branch="projects/my-project/branches/development"
)
branch = Branch(spec=branch_spec)
result = w.postgres.create_branch(
parent="projects/my-project",
branch=branch,
branch_id=branch_id
).wait()
# Run integration tests against branch endpoint
# Branch auto-deletes after 2 hours
No more fighting over shared test databases. No more "who left test data in staging?"
I keep getting asked this, so let's clear it up:
| Lakebase Branching | Delta Lake Time Travel | |
|---|---|---|
| What it is | PostgreSQL with copy-on-write | Delta table version history |
| Workload | OLTP (transactional) | OLAP (analytical) |
| Scope | Entire database | Individual tables |
| Can you write? | Yes, full read/write | No, historical versions are read-only |
| Syntax | SDK/CLI/API for branches, SQL inside | VERSION AS OF, TIMESTAMP AS OF |
Delta time travel lets you read historical states. Lakebase branching gives you a writable, isolated copy that starts from a point in time. Very different use cases.
Before you go branch-crazy, here are the limits that matter:
| Resource | Limit |
|---|---|
| Branches per project | 500 |
| Unarchived branches | 10 (this one bites people) |
| Concurrent active computes | 20 (default branch exempt) |
| Databases per branch | 500 |
| Roles per branch | 500 |
| Max data size per branch | 8 TB |
| Protected branches | 1 per project |
| Root branches | 3 per project |
| History retention (restore window) | 2-35 days |
| Branch expiration max | 30 days |
The unarchived branches limit of 10 is the one that catches teams off guard. If you're spinning up lots of dev branches, inactive ones get archived automatically. Protected branches and default branches are exempt from archival.
You can register Lakebase databases in Unity Catalog for unified governance and cross-source queries. But here's the important part: Unity Catalog catalogs are read-only mirrors.
You can query your Lakebase data from Databricks SQL alongside your Lakehouse tables:
-- Join OLTP data with analytics
SELECT
o.order_id,
o.customer_id,
c.lifetime_value,
c.churn_risk_score
FROM lakebase_catalog.public.orders o
JOIN main.analytics.customer_360 c
ON o.customer_id = c.customer_id
WHERE o.order_date > CURRENT_DATE - INTERVAL '7 days';
But if you want to write to Lakebase, you need to connect directly to your branch endpoint. Also note: each branch requires separate catalog registration, and metadata syncs have caching—new objects may need a manual refresh to appear.
Lakebase branching isn't trying to replace Git for your code—it's bringing the same mental model to your data layer. The ability to instantly create isolated, writable copies of your database changes the development calculus entirely.
No more waiting for database clones. No more "hope this migration doesn't break prod." No more shared test environments where everyone's stepping on each other.
The lack of native merge is a real limitation, but it's one that forces you to treat migrations as first-class citizens anyway—which is probably what you should've been doing all along.
Give it a shot on your next feature. Spin up a personal branch, break something on purpose, reset, and try again. Once you experience that workflow, going back to "let me clone the database real quick" feels like the stone age.
Got questions about Lakebase branching? Drop them in the comments. And if you've figured out a clever workflow I haven't thought of, I'm all ears.
02-04-2026 06:39 AM
This was a fun read — and a great way to spark discussion about what “Git inside my database” really means in practice.
From what I’m seeing in the product world, Databricks isn’t literally putting Git inside the storage engine of your tables — it’s bringing Git workflows directly into the workspace UX so your notebooks, SQL queries, dashboards and other artifacts live in Git folders/Repos and you can branch, commit, push and pull without context-switching out of Databricks.
That shift has a ton of practical value for teams that want classic software engineering best practices — feature branches, CI/CD, collaboration — but it’s also worth grounding expectations a bit: the Git integration is fundamentally a workspace-level source control layer, not a metadata/time-travel layer over the data in your tables. In other words, you’re not versioning your Delta lake like a Git object store here — you’re versioning the code and queries you write against it.
Curious to hear how folks are using this in real projects — especially around branching strategies and managing merge workflows across notebooks and SQL.
Cheers, Louis! 🚀