Summary
Modern applications serve many different clients – mobile apps, web dashboards, partner integrations – and each one needs different slices of the same data. A mobile app showing book listings might need just the title and author name. A web dashboard for publishers might need the full catalog with status and sales figures. A partner integration might need everything in a flat export.
The traditional approach is to build a dedicated REST endpoint for each use case — /mobile/books for the app, /publisher/catalog for the dashboard, /partner/export for the integration. This leads to an explosion of endpoints, duplicated business logic, and a maintenance burden: every time a client's needs change, someone has to build or modify an endpoint. Worse, REST endpoints are expected to return a fixed shape, so clients either get more data than they need (over-fetching, wasting bandwidth) or have to make multiple calls to assemble the full picture (under-fetching, adding latency).
Figure: Naive approaches
GraphQL solves this problem. Instead of building one REST endpoint per use case, you expose a single endpoint and let each client describe exactly the shape of the data it needs. The server returns precisely that shape – no over-fetching, no under-fetching. A mobile app can ask for just titles and author names; the same endpoint gives a publisher dashboard the full catalog with status; and a partner integration can request a flat export – all from one API, zero backend changes.
Figure: Components of GraphQL centric architecture
In practice, however, GraphQL introduces its own complexity:
These complexities bring in their business costs. More code means slower iterations. A separate backend means another service to deploy, monitor, and secure. Schema drift means bugs that only surface at runtime.
What if there were a way to skip all of that?
What if your database schema was your API? No backend code to write, no separate schema to maintain. You define your tables and views in Postgres, add a few comments describing how they relate, and pg_graphql turns that into a fully functional GraphQL API.
pg_graphql is an open-source Postgres extension that introspects your database schema and generates a full GraphQL API from it. On Databricks Lakebase – Databricks' managed Postgres service – pg_graphql is available out of the box.
Here is how the pieces fit together:
Figure: Journey of a GraphQL query on Lakebase
Let’s start with a hypothetical question against two tables authors and books: Give me a list of authors with their books. I just need the author names and their book titles against them. To answer this query, we need a Foreign Key relationship between author.id and books.author_id so that we can join these two tables and get the rows we need. What if the FK relationship doesn’t exist? What if there’s not even a Primary Key in any of these tables?
Without establishing such relationships, GraphQL has no way to figure out the answer to the question. It’s possible that only some columns in the raw tables have these PK / FK relationships explicitly established. For the rest of the columns, the relationships are only ‘implicit’ or ‘virtual’. To resolve these relationships at run time, GraphQL needs a metadata layer which supplies the right information for ‘introspection’. A good way of solving this problem is to create a slew of layers – each layer having a separate responsibility.
For demo purposes, let’s assume all the raw tables in the database belong to the raw_data schema. Only some columns (not all) in these tables have PK/FK constraints specified. These raw tables may have been created and maintained through processes internal to the database, and in some cases external. For example, Lakebase Synced Tables are postgres tables replicated into Lakebase from Unity Catalog (an external source).
You create a new schema api_v1 dedicated for GraphQL metadata. The data objects within api_v1 are database views referencing the raw tables from raw_data and then enriched with @graphql comments, and some pg_graphql utility functions.
|
Layer |
Schema (in demo) |
Purpose |
On GraphQL search_path? |
|
External sources e.g. Unity Catalog |
Any |
Supplier of Synced Tables |
No |
|
Raw Data |
raw_data |
Base tables (native PG or synced from Unity Catalog). |
No |
|
GraphQL Metadata and Utilities |
api_v1 |
Thin single-table view wrappers with @graphql comments declaring virtual PKs and FKs. Also the graphql_api() resolver function. |
Yes |
Separation of responsibility makes pg_graphql only see the api_v1 layer (because of the postgres search_path). For schema resolution, it doesn’t need to see the raw_data layer. This means:
pg_graphql reads @graphql JSON directives from COMMENT ON statements on views:
|
Directive |
What it does |
|
@graphql({"primary_key_columns": ["id"]}) |
Declares which column uniquely identifies rows (like a PK for views) |
|
@graphql({"foreign_keys": [...]}) |
Declares virtual relationships between views |
|
@graphql({"name": "fieldName"}) |
Exposes a SQL function as a computed field on a type |
That is it. Three directives cover every pattern.
Note: pg_graphql has the capability to auto-detect (without explicit @graphql directives) a table’s Primary keys from PRIMARY KEY constraints and Foreign keys from REFERENCES constraints. However, this only works when pg_graphql can see the tables directly on its search_path. In practice, this breaks down quickly: views have no physical constraints to detect, synced tables from UC arrive without FK metadata, and ambiguous FKs (two columns referencing the same table; explained later) get auto-generated names that are meaningless to API consumers. By routing everything through view wrappers with explicit @graphql declarations, we get a uniform pattern that works regardless of whether the underlying table has real constraints or not. The GraphQL schema is defined once, in one place – the api_v1 views – rather than being split between database constraints and comment overrides.
By the end of this post you will have:
We use four tables to demonstrate how different kinds of GraphQL queries can be answered through pg_graphql.
The central entity. Each author has many books, and can send / receive messages. Each author can perform actions too (login, edit_book).
|
Column |
Type |
Notes |
|
id |
serial |
Primary key |
|
name |
text |
Author name |
Sample data: Jane Austen, George Orwell, Toni Morrison.
Works written by an author. Each book belongs to exactly one author.
|
Column |
Type |
Notes |
|
id |
serial |
Primary key |
|
title |
text |
Book title |
|
author_id |
int |
FK to authors (real constraint in raw_data) |
|
status |
text |
'draft' or 'published' |
Sample data: Pride and Prejudice (Jane), 1984 (George), Beloved (Toni)
Communication between two authors. Each message has a sender and a receiver – both are authors. This creates two FKs to the same table.
|
Column |
Type |
Notes |
|
id |
serial |
Primary key |
|
sender_id |
int |
FK to authors |
|
receiver_id |
int |
FK to authors |
|
body |
text |
Message content |
Sample data: Jane to George ("Have you read my latest draft?"), George to Jane ("Yes, it was excellent!").
Actions represent Activity events – who did what and when (login, edit_book, etc.). This table is synced from Unity Catalog via Synced Tables. It has no FK to authors – user_id column logically references an author, but there is no constraint. This is common with data from external systems.
|
Column |
Type |
Notes |
|
id |
bigint |
Primary key |
|
user_id |
bigint |
Logically references authors (no FK) |
|
action |
text |
Action name |
|
at |
timestamp |
When it happened |
Sample data: Jane logged in, Jane edited a book, George logged in.
Figure: Data models and layers
CREATE SCHEMA IF NOT EXISTS raw_data;
CREATE SCHEMA IF NOT EXISTS api_v1;
CREATE EXTENSION IF NOT EXISTS pg_graphql;
This is the single entry point for all GraphQL queries. The critical detail: the search_path includes only api_v1 – not raw_data. This ensures pg_graphql only sees the view wrappers. Outside systems (a web app, a mobile client, a script) send a GraphQL query string to this function; it executes the query against your schema and returns the results as JSON. The Data API exposes it as an HTTP RPC endpoint so any HTTP client can call it:
CREATE OR REPLACE FUNCTION api_v1.graphql_api(
query text,
variables jsonb DEFAULT '{}',
operation_name text DEFAULT NULL
)
RETURNS jsonb
LANGUAGE plpgsql
SECURITY INVOKER
STABLE
SET search_path = api_v1, graphql, public
AS $$
BEGIN
RETURN graphql.resolve(
query := query,
variables := COALESCE(variables, '{}'::jsonb),
"operationName" := operation_name
);
END;
$$;
CREATE TABLE raw_data.authors (
id serial PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE raw_data.books (
id serial PRIMARY KEY,
title text NOT NULL,
author_id int NOT NULL REFERENCES raw_data.authors (id),
status text NOT NULL DEFAULT 'draft'
);
CREATE TABLE raw_data.messages (
id serial PRIMARY KEY,
sender_id int NOT NULL REFERENCES raw_data.authors (id),
receiver_id int NOT NULL REFERENCES raw_data.authors (id),
body text
);
–- raw_data.actions is synced from Unity Catalog via Synced Tables.
–- It is NOT created here – it lands via sync.
Each view wraps a single raw_data table. The @graphql comments declare virtual PKs and virtual FKs:
–- authors
CREATE OR REPLACE VIEW api_v1.authors AS
SELECT id, name FROM raw_data.authors;
COMMENT ON VIEW api_v1.authors IS
'@graphql({"primary_key_columns": ["id"]})';
–- books
CREATE OR REPLACE VIEW api_v1.books AS
SELECT id, title, author_id, status FROM raw_data.books;
COMMENT ON VIEW api_v1.books IS
'@graphql({"primary_key_columns": ["id"], "foreign_keys": [{"local_name": "booksCollection", "local_columns": ["author_id"], "foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"], "foreign_name": "author"}]})';
–- messages (two virtual FKs: sender and receiver)
CREATE OR REPLACE VIEW api_v1.messages AS
SELECT id, sender_id, receiver_id, body FROM raw_data.messages;
COMMENT ON VIEW api_v1.messages IS
'@graphql({"primary_key_columns": ["id"], "foreign_keys": [
{"local_name": "sentMessagesCollection", "local_columns": ["sender_id"],
"foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"],
"foreign_name": "sender"},
{"local_name": "receivedMessagesCollection", "local_columns": ["receiver_id"],
"foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"],
"foreign_name": "receiver"}
]})';
–- actions (synced table, no real FK)
CREATE OR REPLACE VIEW api_v1.actions AS
SELECT id, user_id, action, at FROM raw_data.actions;
COMMENT ON VIEW api_v1.actions IS
'@graphql({"primary_key_columns": ["id"], "foreign_keys": [{"local_name": "actionsCollection", "local_columns": ["user_id"], "foreign_schema": "api_v1", "foreign_table": "authors", "foreign_columns": ["id"], "foreign_name": "author"}]})';
A computed field lets the server calculate a value (like counting an author's books) and return it as part of the query result, so the client doesn't have to fetch the raw data and do the math itself. The scenario is explained later.
CREATE OR REPLACE FUNCTION api_v1.authors_summary(rec api_v1.authors)
RETURNS text
LANGUAGE sql
STABLE
AS $$
SELECT rec.name || ' has written ' || COALESCE(count(b.id), 0)::text
|| CASE WHEN count(b.id) = 1 THEN ' book' ELSE ' books' END
FROM api_v1.books b
WHERE b.author_id = rec.id;
$$;
COMMENT ON FUNCTION api_v1.authors_summary(api_v1.authors) IS
'@graphql({"name": "summary"})';
Note: the function takes api_v1.authors (the view type) and queries api_v1.books (the view). Everything stays within the api_v1 layer.
In the Lakebase App UI, enable the Data API and add api_v1 to "Database schemas to expose via the API." This makes POST /api_v1/rpc/graphql_api available over HTTP.
The function uses SECURITY INVOKER, so queries run with the caller's permissions. Grant access to a service principal (or any role) that needs to use the API:
GRANT USAGE ON SCHEMA raw_data, api_v1, graphql TO my_api_role;
GRANT SELECT ON ALL TABLES IN SCHEMA raw_data, api_v1 TO my_api_role;
GRANT EXECUTE ON FUNCTION api_v1.graphql_api(text, jsonb, text) TO my_api_role;
A publishing platform needs to show each author alongside their books. The books table has a real FK to authors in raw_data, but pg_graphql does not see it – it sees the virtual FK declared on api_v1.books instead.
The GraphQL query:
"Give me the first 5 authors with their name, and for each author, their first 5 books with title and status."
query {
authorsCollection(first: 5) {
edges {
node {
id
name
booksCollection(first: 5) {
edges {
node {
id
title
status
}
}
}
}
}
}
}
A note on edges and nodes. pg_graphql follows the Relay Connection Specification. The path someCollection -> edges -> node is the standard way to reach row data. Think of edges as "results" and node as "one row." You will see this in every query.
What this returns. Each author appears with their books nested inside. Jane Austen would have "Pride and Prejudice," "Sense and Sensibility," and "WIP Novel." George Orwell would have "1984" and "Animal Farm." One request, one round trip.
Below is the raw response. Since it is quite verbose, in subsequent sections we will not display the raw response but only its interpretation.
{
"data": {
"authorsCollection": {
"edges": [
{
"node": {
"booksCollection": {
"edges": [
{
"node": {
"id": 1,
"status": "published",
"title": "Pride and Prejudice"
}
},
{
"node": {
"id": 2,
"status": "published",
"title": "Sense and Sensibility"
}
},
{
"node": {
"id": 6,
"status": "draft",
"title": "WIP Novel"
}
}
]
},
"id": 1,
"name": "Jane Austen"
}
},
{
"node": {
"booksCollection": {
"edges": [
{
"node": {
"id": 3,
"status": "published",
"title": "1984"
}
},
{
"node": {
"id": 4,
"status": "published",
"title": "Animal Farm"
}
}
]
},
"id": 2,
"name": "George Orwell"
}
},
{
"node": {
"booksCollection": {
"edges": [
{
"node": {
"id": 5,
"status": "published",
"title": "Beloved"
}
}
]
},
"id": 3,
"name": "Toni Morrison"
}
}
]
}
}
}
The platform has a messaging feature where authors communicate with each other. Each message has a sender and a receiver — both are authors. This means the messages table has two columns (sender_id and receiver_id) that both point to the same authors table.
This creates an ambiguity problem. When pg_graphql sees a single FK to authors, it automatically names the relationship author on the child and authorsCollection on the parent — clear enough. But with two FKs to the same table, which one is the author? The sender or the receiver? pg_graphql cannot guess, so we need to tell it.
The solution is straightforward: the @graphql comment on the api_v1.messages view declares two virtual FKs, each with an explicit name. This is not a special pattern — it is the same foreign_keys array used by every other view. The only difference is that there are two entries instead of one.
The @graphql comment declares two virtual FKs:
The query:
"Give me the first 5 messages, and for each message show the text, who sent it, and who received it."
query {
messagesCollection(first: 5) {
edges {
node {
id
body
sender { id name }
receiver { id name }
}
}
}
}
What this returns. Each message has its body text and two clearly labeled author objects. "Have you read my latest draft?" shows sender: Jane Austen, receiver: George Orwell. No ambiguity.
A publisher's dashboard wants a quick summary for each author – "Jane Austen has written 3 books" – without the client counting books itself. The value is computed on the server at query time.
The function api_v1.authors_summary takes an api_v1.authors row, counts books from api_v1.books, and returns a summary string. The @graphql({"name": "summary"}) comment exposes it as the summary field on the Author type.
The query:
"Give me the first 5 authors with their name and a summary."
query {
authorsCollection(first: 5) {
edges {
node {
id
name
summary
}
}
}
}
What this returns. Jane Austen: "Jane Austen has written 3 books." George Orwell: "George Orwell has written 2 books." The count is always current – no client-side aggregation needed.
The platform tracks user actions (login, edit_book, etc.) in Unity Catalog through the actions table. This table is synced into Lakebase via Synced Tables, landing as raw_data.actions. There is no FK to raw_data.authors – just a user_id column that logically references an author.
The api_v1.actions view wrapper declares a virtual FK: user_id -> authors.id. pg_graphql treats it exactly like any other relationship. The client does not know or care that the data came from Unity Catalog.
The Unity Catalog source:
USE CATALOG mycatalog;
CREATE SCHEMA IF NOT EXISTS raw_data;
CREATE TABLE IF NOT EXISTS raw_data.actions (
id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
action STRING,
at TIMESTAMP
) USING DELTA;
After syncing to Lakebase, the view wrapper in api_v1 makes it queryable via GraphQL.
The query:
"Give me the first 5 authors, and for each author, their 3 most recent actions (like login or edit_book)."
query {
authorsCollection(first: 5) {
edges {
node {
id
name
actionsCollection(first: 3) {
edges {
node {
id
action
author { id name }
}
}
}
}
}
}
}
What this returns. Each author appears with their recent actions. Jane Austen: login yesterday, edit_book 12 hours ago. Each action also links back to its author via the author field – the virtual FK works in both directions.
Once the schema is in place, any application can query the data over HTTP. There is no custom server to deploy.
The Lakebase Data API exposes the graphql_api function as an RPC endpoint:
POST $DATA_API_URL/api_v1/rpc/graphql_api
Content-Type: application/json
Authorization: Bearer $TOKEN
{
"query": "...",
"variables": {},
"operation_name": null
}
curl example
curl -X POST "$DATA_API_URL/api_v1/rpc/graphql_api" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "query { authorsCollection(first: 5) { edges { node { id name summary } } } }",
"variables": {},
"operation_name": null
}'
GRAPHQL_QUERY = """query {
authorsCollection(first: 5) {
edges {
node {
id
name
summary
actionsCollection(first: 3) {
edges {
node { id action }
}
}
}
}
}
}"""
result = api.post("api_v1", "rpc/graphql_api", {
"query": GRAPHQL_QUERY,
"variables": {},
"operation_name": None,
})
for edge in result["data"]["authorsCollection"]["edges"]:
author = edge["node"]
actions = [a["node"]["action"] for a in author["actionsCollection"]["edges"]]
print(f"{author['name']}: {author['summary']} | actions: {actions}")
We started with four tables – authors, books, messages, and actions – spanning native Postgres and Unity Catalog. Without writing a single resolver or maintaining a separate backend, we built a GraphQL API that answers real business questions:
The approach is simple: view wrappers with @graphql metadata. Every table – whether it has real FKs or none, whether it is native Postgres or synced from Unity Catalog – follows the same pattern. The raw_data layer stores data. The api_v1 layer defines the GraphQL schema. pg_graphql introspects the views and generates the API.
If your data lives in Postgres – or can be synced there from Unity Catalog – pg_graphql and Lakebase give you a managed, serverless GraphQL endpoint with minimal setup. Define your views, add a few comments, and your API is ready.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.