cancel
Showing results for 
Search instead for 
Did you mean: 
Lakebase Blogs
Discover curated blogs from the community and experts. Learn through practical guides, use cases, and thought pieces designed to help you get more out of Lakebase.
cancel
Showing results for 
Search instead for 
Did you mean: 
uday_satapathy
Databricks Employee
Databricks Employee

Summary

  • One API, every client: Instead of building separate endpoints for each app or partner, GraphQL lets every consumer ask for exactly the data it needs — but typically requires a lot of backend plumbing.
  • No backend required: A built-in Postgres extension on Lakebase reads your database structure and generates a working GraphQL API automatically.
  • Minimal setup: Define your data model once, add lightweight annotations, and expose it as an HTTP endpoint — no application server in between.   

 

GraphQL and Traditional approaches

Modern applications serve many different clients – mobile apps, web dashboards, partner integrations – and each one needs different slices of the same data. A mobile app showing book listings might need just the title and author name. A web dashboard for publishers might need the full catalog with status and sales figures. A partner integration might need everything in a flat export.

The traditional approach is to build a dedicated REST endpoint for each use case — /mobile/books for the app, /publisher/catalog for the dashboard, /partner/export for the integration. This leads to an explosion of endpoints, duplicated business logic, and a maintenance burden: every time a client's needs change, someone has to build or modify an endpoint. Worse, REST endpoints are expected to return a fixed shape, so clients either get more data than they need (over-fetching, wasting bandwidth) or have to make multiple calls to assemble the full picture (under-fetching, adding latency).

uday_satapathy_0-1774905582530.png Figure: Naive approaches

GraphQL solves this problem. Instead of building one REST endpoint per use case, you expose a single endpoint and let each client describe exactly the shape of the data it needs. The server returns precisely that shape – no over-fetching, no under-fetching. A mobile app can ask for just titles and author names; the same endpoint gives a publisher dashboard the full catalog with status; and a partner integration can request a flat export – all from one API, zero backend changes.

uday_satapathy_1-1774905582531.png Figure: Components of GraphQL centric architecture

In practice, however, GraphQL introduces its own complexity:

  • Resolver functions: Every type and relationship needs a resolver: a function that fetches the data. For a schema with authors, books, and messages, you might write dozens of resolvers.
  • Schema duplication: You define your data model twice: once in the database and again in a GraphQL schema definition language (SDL) file. The two must stay in sync.
  • N+1 query problems: A naive resolver for "authors with their books" fires one query for authors, then one per author for books. Fixing this requires data loaders, batching, or caching, basically more code.
  • A separate backend layer: All of this lives in an application server that sits between the database and the API consumer.

These complexities bring in their business costs. More code means slower iterations. A separate backend means another service to deploy, monitor, and secure. Schema drift means bugs that only surface at runtime.

What if there were a way to skip all of that?

GraphQL with Lakebase

What if your database schema was your API? No backend code to write, no separate schema to maintain. You define your tables and views in Postgres, add a few comments describing how they relate, and pg_graphql turns that into a fully functional GraphQL API.

pg_graphql is an open-source Postgres extension that introspects your database schema and generates a full GraphQL API from it. On Databricks Lakebase – Databricks' managed Postgres service – pg_graphql is available out of the box.

The end-to-end value chain

Here is how the pieces fit together:

  1. Build GraphQL metadata layer – tables, views, FKs, and @graphql comments. pg_graphql introspects your schema at query time and generates the GraphQL types, fields, and relationships.
  2. Create the GraphQL wrapper function – a thin Postgres function that calls graphql.resolve().
  3. Expose schemas via the Data API – configure the Lakebase Data API (postgREST spec) to serve your API schema over HTTP. Once done, the wrapper function is exposed as an HTTP endpoint: POST /api_v1/rpc/graphql_api.

uday_satapathy_2-1774905582528.jpeg

Figure: Journey of a GraphQL query on Lakebase

Building the GraphQL metadata layer

Let’s start with a hypothetical question against two tables authors and books: Give me a list of authors with their books. I just need the author names and their book titles against them. To answer this query, we need a Foreign Key relationship between author.id and books.author_id so that we can join these two tables and get the rows we need. What if the FK relationship doesn’t exist? What if there’s not even a Primary Key in any of these tables?

Without establishing such relationships, GraphQL has no way to figure out the answer to the question. It’s possible that only some columns in the raw tables have these PK / FK relationships explicitly established. For the rest of the columns, the relationships are only ‘implicit’ or ‘virtual’. To resolve these relationships at run time, GraphQL needs a metadata layer which supplies the right information for ‘introspection’. A good way of solving this problem is to create a slew of layers – each layer having a separate responsibility.

For demo purposes, let’s assume all the raw tables in the database belong to the raw_data schema. Only some columns (not all) in these tables have PK/FK constraints specified. These raw tables may have been created and maintained through processes internal to the database, and in some cases external. For example, Lakebase Synced Tables are postgres tables replicated into Lakebase from Unity Catalog (an external source).

You create a new schema api_v1 dedicated for GraphQL metadata. The data objects within api_v1 are database views referencing the raw tables from raw_data and then enriched with @graphql comments, and some pg_graphql utility functions.

uday_satapathy_3-1774905582528.jpeg

 

Layer

Schema 

(in demo)

Purpose

On GraphQL search_path?

External sources e.g. Unity Catalog

Any

Supplier of Synced Tables

No

Raw Data

raw_data

Base tables (native PG or synced from Unity Catalog).

No

GraphQL Metadata and Utilities

api_v1

Thin single-table view wrappers with @graphql comments declaring virtual PKs and FKs. Also the graphql_api() resolver function.

Yes

Separation of responsibility makes pg_graphql only see the api_v1 layer (because of the postgres search_path). For schema resolution, it doesn’t need to see the raw_data layer. This means:

  • Every relationship is declared the same way – via @graphql virtual FKs on views
  • It does not matter whether the underlying table has a real FK or not
  • Tables synced from Unity Catalog work exactly like native PG tables
  • The pattern is uniform for every table

What metadata does pg_graphql need?

pg_graphql reads @graphql JSON directives from COMMENT ON statements on views:

 

Directive

What it does

@graphql({"primary_key_columns": ["id"]})

Declares which column uniquely identifies rows (like a PK for views)

@graphql({"foreign_keys": [...]})

Declares virtual relationships between views

@graphql({"name": "fieldName"})

Exposes a SQL function as a computed field on a type

That is it. Three directives cover every pattern.

Note: pg_graphql has the capability to auto-detect (without explicit @graphql directives) a table’s Primary keys from PRIMARY KEY constraints and Foreign keys from REFERENCES constraints. However, this only works when pg_graphql can see the tables directly on its search_path. In practice, this breaks down quickly: views have no physical constraints to detect, synced tables from UC arrive without FK metadata, and ambiguous FKs (two columns referencing the same table; explained later) get auto-generated names that are meaningless to API consumers. By routing everything through view wrappers with explicit @graphql declarations, we get a uniform pattern that works regardless of whether the underlying table has real constraints or not. The GraphQL schema is defined once, in one place – the api_v1 views –  rather than being split between database constraints and comment overrides.

What you'll build

By the end of this post you will have:

  1. A Postgres function api_v1.graphql_api(query, variables, operation_name) that calls graphql.resolve(...) – the single entry point for all GraphQL queries.
  2. An HTTP endpoint via the Lakebase Data API at POST /api_v1/rpc/graphql_api – any HTTP client can send GraphQL queries without a separate app server.
  3. A two-layer schema: raw_data for base tables and api_v1 for view wrappers with @graphql metadata.

The data model

We use four tables to demonstrate how different kinds of GraphQL queries can be answered through pg_graphql.

Authors

The central entity. Each author has many books, and can send / receive messages. Each author can perform actions too (login, edit_book).

Column

Type

Notes

id

serial

Primary key

name

text

Author name

Sample data: Jane Austen, George Orwell, Toni Morrison.

Books

Works written by an author. Each book belongs to exactly one author.

Column

Type

Notes

id

serial

Primary key

title

text

Book title

author_id

int

FK to authors (real constraint in raw_data)

status

text

'draft' or 'published'

Sample data: Pride and Prejudice (Jane), 1984 (George), Beloved (Toni)

Messages

Communication between two authors. Each message has a sender and a receiver – both are authors. This creates two FKs to the same table.

Column

Type

Notes

id

serial

Primary key

sender_id

int

FK to authors

receiver_id

int

FK to authors

body

text

Message content

Sample data: Jane to George ("Have you read my latest draft?"), George to Jane ("Yes, it was excellent!").

Actions

Actions represent Activity events – who did what and when (login, edit_book, etc.). This table is synced from Unity Catalog via Synced Tables. It has no FK to authors user_id column logically references an author, but there is no constraint. This is common with data from external systems.

Column

Type

Notes

id

bigint

Primary key

user_id

bigint

Logically references authors (no FK)

action

text

Action name

at

timestamp

When it happened

Sample data: Jane logged in, Jane edited a book, George logged in.

graphql-blog.jpeg

Figure: Data models and layers

 

Setting it up

Step 1: Create schemas and enable pg_graphql

CREATE SCHEMA IF NOT EXISTS raw_data;

CREATE SCHEMA IF NOT EXISTS api_v1;

CREATE EXTENSION IF NOT EXISTS pg_graphql;

Step 2: Create the GraphQL resolver function

This is the single entry point for all GraphQL queries. The critical detail: the search_path includes only api_v1 – not raw_data. This ensures pg_graphql only sees the view wrappers. Outside systems (a web app, a mobile client, a script) send a GraphQL query string to this function; it executes the query against your schema and returns the results as JSON. The Data API exposes it as an HTTP RPC endpoint so any HTTP client can call it:

CREATE OR REPLACE FUNCTION api_v1.graphql_api(

  query         text,

  variables     jsonb DEFAULT '{}',

  operation_name text DEFAULT NULL

)

RETURNS jsonb

LANGUAGE plpgsql

SECURITY INVOKER

STABLE

SET search_path = api_v1, graphql, public

AS $$

BEGIN

  RETURN graphql.resolve(

    query          := query,

    variables      := COALESCE(variables, '{}'::jsonb),

    "operationName" := operation_name

  );

END;

$$;

Step 3: Create base tables in raw_data

CREATE TABLE raw_data.authors (

  id   serial PRIMARY KEY,

  name text NOT NULL

);



CREATE TABLE raw_data.books (

  id        serial PRIMARY KEY,

  title     text NOT NULL,

  author_id int  NOT NULL REFERENCES raw_data.authors (id),

  status    text NOT NULL DEFAULT 'draft'

);



CREATE TABLE raw_data.messages (

  id          serial PRIMARY KEY,

  sender_id   int  NOT NULL REFERENCES raw_data.authors (id),

  receiver_id int  NOT NULL REFERENCES raw_data.authors (id),

  body        text

);



–- raw_data.actions is synced from Unity Catalog via Synced Tables.

–- It is NOT created here – it lands via sync.

Step 4: Create view wrappers in api_v1

Each view wraps a single raw_data table. The @graphql comments declare virtual PKs and virtual FKs:

–- authors

CREATE OR REPLACE VIEW api_v1.authors AS

SELECT id, name FROM raw_data.authors;



COMMENT ON VIEW api_v1.authors IS

  '@graphql({"primary_key_columns": ["id"]})';



–- books

CREATE OR REPLACE VIEW api_v1.books AS

SELECT id, title, author_id, status FROM raw_data.books;



COMMENT ON VIEW api_v1.books IS

  '@graphql({"primary_key_columns": ["id"], "foreign_keys": [{"local_name": "booksCollection", "local_columns": ["author_id"], "foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"], "foreign_name": "author"}]})';



–- messages (two virtual FKs: sender and receiver)

CREATE OR REPLACE VIEW api_v1.messages AS

SELECT id, sender_id, receiver_id, body FROM raw_data.messages;



COMMENT ON VIEW api_v1.messages IS

  '@graphql({"primary_key_columns": ["id"], "foreign_keys": [

    {"local_name": "sentMessagesCollection", "local_columns": ["sender_id"],

     "foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"],

     "foreign_name": "sender"},

    {"local_name": "receivedMessagesCollection", "local_columns": ["receiver_id"],

     "foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"],

     "foreign_name": "receiver"}

  ]})';



–- actions (synced table, no real FK)

CREATE OR REPLACE VIEW api_v1.actions AS

SELECT id, user_id, action, at FROM raw_data.actions;



COMMENT ON VIEW api_v1.actions IS

  '@graphql({"primary_key_columns": ["id"], "foreign_keys": [{"local_name": "actionsCollection", "local_columns": ["user_id"], "foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"], "foreign_name": "author"}]})';

 

Step 5: Add the computed field

A computed field lets the server calculate a value (like counting an author's books) and return it as part of the query result, so the client doesn't have to fetch the raw data and do the math itself. The scenario is explained later.

CREATE OR REPLACE FUNCTION api_v1.authors_summary(rec api_v1.authors)

RETURNS text

LANGUAGE sql

STABLE

AS $$

  SELECT rec.name || ' has written ' || COALESCE(count(b.id), 0)::text

         || CASE WHEN count(b.id) = 1 THEN ' book' ELSE ' books' END

  FROM api_v1.books b

  WHERE b.author_id = rec.id;

$$;



COMMENT ON FUNCTION api_v1.authors_summary(api_v1.authors) IS

  '@graphql({"name": "summary"})';

Note: the function takes api_v1.authors (the view type) and queries api_v1.books (the view). Everything stays within the api_v1 layer.

Step 6: Enable the Data API

In the Lakebase App UI, enable the Data API and add api_v1 to "Database schemas to expose via the API." This makes POST /api_v1/rpc/graphql_api available over HTTP.

Step 7: Permissions

The function uses SECURITY INVOKER, so queries run with the caller's permissions. Grant access to a service principal (or any role) that needs to use the API:

GRANT USAGE ON SCHEMA raw_data, api_v1, graphql TO my_api_role;

GRANT SELECT ON ALL TABLES IN SCHEMA raw_data, api_v1 TO my_api_role;

GRANT EXECUTE ON FUNCTION api_v1.graphql_api(text, jsonb, text) TO my_api_role;

 

Patterns

A. Using simple FK relationships: "Which books did this author write?"

A publishing platform needs to show each author alongside their books. The books table has a real FK to authors in raw_data, but pg_graphql does not see it – it sees the virtual FK declared on api_v1.books instead.

The GraphQL query:

"Give me the first 5 authors with their name, and for each author, their first 5 books with title and status."

query {

  authorsCollection(first: 5) {

    edges {

      node {

        id

        name

        booksCollection(first: 5) {

          edges {

            node {

              id

              title

              status

            }

          }

        }

      }

    }

  }

}

A note on edges and nodes. pg_graphql follows the Relay Connection Specification. The path someCollection -> edges -> node is the standard way to reach row data. Think of edges as "results" and node as "one row." You will see this in every query.

What this returns. Each author appears with their books nested inside. Jane Austen would have "Pride and Prejudice," "Sense and Sensibility," and "WIP Novel." George Orwell would have "1984" and "Animal Farm." One request, one round trip.

Below is the raw response. Since it is quite verbose, in subsequent sections we will not display the raw response but only its interpretation.

{

  "data": {

    "authorsCollection": {

      "edges": [

        {

          "node": {

            "booksCollection": {

              "edges": [

                {

                  "node": {

                    "id": 1,

                    "status": "published",

                    "title": "Pride and Prejudice"

                  }

                },

                {

                  "node": {

                    "id": 2,

                    "status": "published",

                    "title": "Sense and Sensibility"

                  }

                },

                {

                  "node": {

                    "id": 6,

                    "status": "draft",

                    "title": "WIP Novel"

                  }

                }

              ]

            },

            "id": 1,

            "name": "Jane Austen"

          }

        },

        {

          "node": {

            "booksCollection": {

              "edges": [

                {

                  "node": {

                    "id": 3,

                    "status": "published",

                    "title": "1984"

                  }

                },

                {

                  "node": {

                    "id": 4,

                    "status": "published",

                    "title": "Animal Farm"

                  }

                }

              ]

            },

            "id": 2,

            "name": "George Orwell"

          }

        },

        {

          "node": {

            "booksCollection": {

              "edges": [

                {

                  "node": {

                    "id": 5,

                    "status": "published",

                    "title": "Beloved"

                  }

                }

              ]

            },

            "id": 3,

            "name": "Toni Morrison"

          }

        }

      ]

    }

  }

}

B. Ambiguous FK: "Who sent and who received this message?"

The platform has a messaging feature where authors communicate with each other. Each message has a sender and a receiver — both are authors. This means the messages table has two columns (sender_id and receiver_id) that both point to the same authors table.

This creates an ambiguity problem. When pg_graphql sees a single FK to authors, it automatically names the relationship author on the child and authorsCollection on the parent — clear enough. But with two FKs to the same table, which one is the author? The sender or the receiver? pg_graphql cannot guess, so we need to tell it.

uday_satapathy_5-1774905582529.png

The solution is straightforward: the @graphql comment on the api_v1.messages view declares two virtual FKs, each with an explicit name. This is not a special pattern — it is the same foreign_keys array used by every other view. The only difference is that there are two entries instead of one.

The @graphql comment declares two virtual FKs:

  • sender_id -> authors.id with field name sender on Message and sentMessagesCollection on Author
  • receiver_id -> authors.id with field name receiver on Message and receivedMessagesCollection on Author

The query:

"Give me the first 5 messages, and for each message show the text, who sent it, and who received it."

query {

  messagesCollection(first: 5) {

    edges {

      node {

        id

        body

        sender { id name }

        receiver { id name }

      }

    }

  }

}

What this returns. Each message has its body text and two clearly labeled author objects. "Have you read my latest draft?" shows sender: Jane Austen, receiver: George Orwell. No ambiguity.

C. Computed field: "How many books has this author written?"

A publisher's dashboard wants a quick summary for each author – "Jane Austen has written 3 books" – without the client counting books itself. The value is computed on the server at query time.

The function api_v1.authors_summary takes an api_v1.authors row, counts books from api_v1.books, and returns a summary string. The @graphql({"name": "summary"}) comment exposes it as the summary field on the Author type.

The query:

"Give me the first 5 authors with their name and a summary."

query {

  authorsCollection(first: 5) {

    edges {

      node {

        id

        name

        summary

      }

    }

  }

}

What this returns. Jane Austen: "Jane Austen has written 3 books." George Orwell: "George Orwell has written 2 books." The count is always current – no client-side aggregation needed.

D. Synced table with no FK: "What actions did this author perform?"

The platform tracks user actions (login, edit_book, etc.) in Unity Catalog through the actions table. This table is synced into Lakebase via Synced Tables, landing as raw_data.actions. There is no FK to raw_data.authors – just a user_id column that logically references an author.

The api_v1.actions view wrapper declares a virtual FK: user_id -> authors.id. pg_graphql treats it exactly like any other relationship. The client does not know or care that the data came from Unity Catalog.

The Unity Catalog source:

USE CATALOG mycatalog;

CREATE SCHEMA IF NOT EXISTS raw_data;



CREATE TABLE IF NOT EXISTS raw_data.actions (

  id      BIGINT NOT NULL,

  user_id BIGINT NOT NULL,

  action  STRING,

  at      TIMESTAMP

) USING DELTA;

After syncing to Lakebase, the view wrapper in api_v1 makes it queryable via GraphQL.

The query:

"Give me the first 5 authors, and for each author, their 3 most recent actions (like login or edit_book)."

query {

  authorsCollection(first: 5) {

    edges {

      node {

        id

        name

        actionsCollection(first: 3) {

          edges {

            node {

              id

              action

              author { id name }

            }

          }

        }

      }

    }

  }

}

What this returns. Each author appears with their recent actions. Jane Austen: login yesterday, edit_book 12 hours ago. Each action also links back to its author via the author field – the virtual FK works in both directions.

Calling the GraphQL API

Once the schema is in place, any application can query the data over HTTP. There is no custom server to deploy.

The Lakebase Data API exposes the graphql_api function as an RPC endpoint:

POST $DATA_API_URL/api_v1/rpc/graphql_api

Content-Type: application/json

Authorization: Bearer $TOKEN



{

  "query": "...",

  "variables": {},

  "operation_name": null

}

curl example


curl -X POST "$DATA_API_URL/api_v1/rpc/graphql_api" \

  -H "Authorization: Bearer $TOKEN" \

  -H "Content-Type: application/json" \

  -d '{

    "query": "query { authorsCollection(first: 5) { edges { node { id name summary } } } }",

    "variables": {},

    "operation_name": null

  }'

Python example

GRAPHQL_QUERY = """query {

  authorsCollection(first: 5) {

    edges {

      node {

        id

        name

        summary

        actionsCollection(first: 3) {

          edges {

            node { id action }

          }

        }

      }

    }

  }

}"""



result = api.post("api_v1", "rpc/graphql_api", {

    "query": GRAPHQL_QUERY,

    "variables": {},

    "operation_name": None,

})



for edge in result["data"]["authorsCollection"]["edges"]:

    author = edge["node"]

    actions = [a["node"]["action"] for a in author["actionsCollection"]["edges"]]

    print(f"{author['name']}: {author['summary']} | actions: {actions}")

 

Conclusion

We started with four tables – authors, books, messages, and actions – spanning native Postgres and Unity Catalog. Without writing a single resolver or maintaining a separate backend, we built a GraphQL API that answers real business questions:

  • "Which books did this author write?" – via a virtual FK on the books view
  • "Who sent this message and who received it?" – via two named virtual FKs on the messages view
  • "How many books has this author written?" – via a computed field (SQL function)
  • "What actions did this author perform?" – via a virtual FK on a synced table with no real constraint 

The approach is simple: view wrappers with @graphql metadata. Every table – whether it has real FKs or none, whether it is native Postgres or synced from Unity Catalog – follows the same pattern. The raw_data layer stores data. The api_v1 layer defines the GraphQL schema. pg_graphql introspects the views and generates the API.

If your data lives in Postgres – or can be synced there from Unity Catalog – pg_graphql and Lakebase give you a managed, serverless GraphQL endpoint with minimal setup. Define your views, add a few comments, and your API is ready.