cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
MohanaBasak
Databricks Employee
Databricks Employee

Your Own GeoGenie: Natural Language Geospatial Analytics on Databricks

Every enterprise has geospatial data. Telecom companies track tower locations. Retailers map store footprints. Logistics networks route around real-world geography. But when a business user asks a simple location-based question — which sites in this region generate the highest revenue? — the answer almost always requires an engineer, a handful of SQL queries, and a turnaround measured in hours.

The gap is not in storage or compute. Databricks handles both. The gap is in the last mile: turning a spatial question into a spatial query.

The problem with spatial SQL today

Consider a retail planner who wants to know which stores within a target region are underperforming. Today, that question requires:

  • Knowing the table schema, column names, and which columns hold coordinates
  • Writing SQL with spatial functions like ST_Intersects and ST_GeomFromWKT
  • Manually constructing WKT geometry strings to define the area of interest
  • Iterating through multiple queries to narrow down the right region

For engineers, this is tedious. For business users, it is a wall. The result is that most geospatial data sits underused — not because the platform cannot handle it, but because the interface demands too much expertise.

Summon GeoGenie: draw a region, ask a question, get an answer

GeoGenie is an open-source reference app that shows how Databricks Genie Space and Databricks Apps can be combined to reduce this friction. The workflow has three steps:

  1. Explore a 3D interactive globe with site locations plotted as beacons. Click any site to inspect details — name, tenant, revenue, city, image.
  2. Draw a polygon or rectangle directly on the map to define a region of interest.
  3. Ask a question in plain English. GeoGenie handles the rest.

Behind the scenes, when a user draws a shape, GeoGenie converts the polygon coordinates into WKT and appends that geometry to the prompt sent to Databricks Genie. Genie generates the SQL with the spatial filter already built in:

WITH ranked_sites AS (
  SELECT *,
    RANK() OVER (ORDER BY total_monthly_revenue DESC) AS revenue_rank
  FROM catalog.geogenie.site_locations
  WHERE ST_Intersects(
    ST_GeomFromWKT('POLYGON((-102.3 26.8, -95.2 26.8, -95.2 31.4, -102.3 31.4, -102.3 26.8))'),
    ST_Point(longitude, latitude)
  )
)
SELECT site_name, city, state, tenant_name, total_monthly_revenue
FROM ranked_sites
WHERE revenue_rank <= 5
ORDER BY total_monthly_revenue DESC;

The Genie Space is pre-configured with spatial-query instructions, so the generated SQL consistently uses ST_Intersects, constructs points with longitude first in ST_Point, and applies the drawn region as a spatial predicate automatically. The user focuses on the question. GeoGenie handles the spatial context and query generation.

How it works

MohanaBasak_0-1773851159799.png

GeoGenie combines four capabilities in the Databricks platform into a single experience:

Layer

Technology

Frontend

CesiumJS 3D globe + Streamlit

AI / NL-to-SQL

Databricks Genie Space

Data governance

Unity Catalog + SQL Warehouse

App platform

Databricks Apps

The Cesium globe runs inside an iframe rendered through Streamlit's custom component API and communicates with the Python backend via window.postMessage. This lets GeoGenie pair a rich JavaScript geospatial UI with simple Streamlit application logic — all deployed and authenticated through Databricks Apps with a service principal.

Setup is a single notebook. Clone the repo, run setup_and_deploy, and the notebook provisions everything: the Unity Catalog objects, synthetic sample data, a configured Genie Space with spatial instructions, and the deployed app with the right permissions. After setup, the app is ready to share with your team.

Three Wishes: What this unlocks?

The way GeoGenie approaches map interactivity when generating Genie queries is important for three key reasons:

New personas get access to spatial analytics. A business analyst who cannot write ST_Intersects can now draw a region and ask a question. The barrier drops from "knows spatial SQL" to "can point at a map."

Governed by default. Every query runs through Unity Catalog. A business user gets spatial answers without being granted direct table access. The Genie Space controls which tables and columns are exposed. This is not a shortcut around governance — it is governance made usable.

A pattern, not just a demo. The architecture — visual interaction layer, natural language interface, governed data — is reusable. Telecom network planning, retail site intelligence, logistics coverage analysis, field asset management: any domain where location is a first-class dimension of the data can follow this pattern.

Try It

The repository is open source: https://github.com/databricks-solutions/genie-geo-chat

If you work with geospatial data on Databricks, use it as a starting point and adapt it to your own use case. Open an issue if you run into problems, and share what this inspires you to build. The combination of spatial SQL, natural language, and governed data access is still early — and there is a lot of room to push it further.

 

1 Comment