Databricks Community

paigens · ‎08-12-2025

When exploring a neighborhood, it’s easy to observe the businesses that already thrive there and imagine the potential opportunities for a piece of real estate. What if GenAI could take the wheel by analysing and uncovering insights beyond what we can see? In this blog, we will present how an LLM agent can analyse the commercial real estate landscape to help identify and prioritise top investment opportunities for a geographical area. We will also explore integrating spatial insights into an agentic architecture and address the challenges of scaling spatial data. This approach showcases the power of Databricks' Vector Search Index, enhanced with spatial metadata, to deliver fast and efficient responses to a user-provided restaurant description. By leveraging Databricks’ spatial SQL functions, we can enhance our LLM Agents with geospatial intelligence, enabling smart real estate investment recommendations while minimizing compute load.

Other types of applications & industries

While this blog post focuses on real estate investment, the same approach can be broadly applied to other industries, from selecting the ideal location for a brick-and-mortar retail store to opening a new healthcare facility, identifying home insurance risk, or considering where to travel. Moreover, a similar approach could be used to provide an interactive experience to users looking for homes or rental units, allowing them to look for key features in both homes and neighbourhoods seamlessly. Additionally, with access to property risk data, such an approach can enable insurance agents to efficiently assess risk levels or identify safer zones for coverage—all through an easy-to-use interface.

Unique Challenges of AI for Geospatial - Working with Geospatial and Point of Interest Data

H3 & Databricks

A critical part of any spatial project is consolidating spatial insights in intelligent ways that conserve processing time and provide necessary information to our tools. As part of this process, we need to understand what points of interest (POIs) are where and what the surrounding marketplace looks like, both from a competitive and a demographic perspective. Traditionally, this has required computationally intensive spatial joins based on geometric calculations and map projections like UTM, involving coordinate transformations, distance measurements, and overlap detection. As data volumes grow, these methods become increasingly expensive and difficult to scale.

To address this, we use the H3 spatial index—a hierarchical hexagonal grid system that encodes locations into unique hex cells. Instead of geometric joins, H3 enables fast, scalable spatial queries through simple cell ID comparisons, with only minor precision trade-offs at hex boundaries. With H3, we can quickly identify and consolidate market areas around real estate locations. Each hexagon is represented by a unique string, and because the grid is hierarchical, smaller hexes roll up into larger ones—allowing us to analyze spatial patterns across varying distances without explicitly calculating them.

With Databricks’ native support for H3 in SQL, we can seamlessly integrate this spatial indexing into our analytics workflows. Functions like H3_longlathash3 make it easy to convert raw latitude and longitude into H3 cell IDs, organizing our data into a structure that's both performant and highly queryable. With Databricks’ scalability and built-in geospatial capabilities, we're able to process and analyse massive spatial datasets with the speed and precision modern use cases demand.

Scalable Distance Approximations

For our use case, we explore the challenge of opening a new restaurant in the city of Seattle. We conducted one hundred trial runs to evaluate the potential improvements from using H3 to identify neighbouring restaurants within a geographical area versus calculating exact distances. In the traditional approach, the distance between all restaurant coordinates was calculated using a Universal Transverse Mercator Coordinate Reference System (UTM CRS map projection). This was then converted back to a standard EPSG:4326 map projection, and finally, a spatial join was completed to get all nearby restaurants within a .75-kilometer radius. In the hexagonal grid approach, the h3_longlatash3 Databricks SQL function was used to convert coordinates of restaurants into an H3 cell ID at approximately a .75-kilometer resolution (resolution level = 8).

By leveraging hexagonal grids (H3) rather than calculating the distance between all restaurant coordinates in Seattle to find nearby restaurant neighbours, the distance calculation run-time was reduced by ~95% on average, further demonstrating how H3 is well suited for real-time GenAI applications. See the table below for detailed summary statistics on the benchmark results.

Use Case

Opening a new restaurant is an exciting and challenging endeavour. One of the most crucial decisions new restaurant owners make is selecting the ideal location. You want to find a spot that not only meets your vision but also attracts the right customers. Traditionally, conducting the necessary market research to determine the best location could take days, if not weeks. This includes analysing competitor presence, understanding the population makeup, and predicting the local market's behaviour. But what if you could streamline this process and generate a comprehensive market analysis in just minutes? In this blog, we will explore how we leveraged AI agent system and spatial data platforms, to provide a comprehensive market analysis and recommend the most promising restaurant locations.

Approach

Geospatial Data Collection

To leverage the power of H3, the first step was to map the restaurants from the Four Square Places API to their respective hexagonal area. To achieve this, we begin by collecting all restaurant data in Seattle. We use the census tract shapefile as the basis for our API calls, leveraging the polygon geometries that outline each neighbourhood. These polygons are converted into H3 hexagons at resolution 8 using the h3_tessellateaswkb function in Databricks SQL. For each resulting hexagon, we retrieve its boundary as a polygon with the h3_boundaryaswkt Databricks SQL function, which provides the necessary geometry for our API search parameters. This process is repeated for all hexagons covering Seattle.

The final dataset comprises the hexagon ID, corresponding census tract, and Foursquare data and is written to a Delta table. Foursquare is used to obtain point of interest (POI) data representing various restaurants in the Seattle area. The data is received as a JSON consisting of a variety of attributes including a unique place ID, name, address, locality, latitude/longitude (geocode), popularity, ratings, hours and many more. This table was later used to build the Mosaic AI Vector Search Index. The underlying data and vector search index is refreshed monthly via a scheduled Databricks job to reflect any restaurant changes in the Puget Sound area.

geospatial LLM vector search screenshot.png

image (1).png

Compound AI System

Our agentic AI system architecture generates a comprehensive market analysis for new restaurant locations using a three-step process that mirrors how a human would approach conducting an analysis. The system combines restaurant-specific data with detailed demographic information to create a complete profile of each location under consideration.

Finding a Comparable Location: The Starting Point

When evaluating the best location for a new restaurant, the first step is often identifying a successful, comparable restaurant to base the analysis on. For instance, if a user asks, "I want to open an upscale Korean restaurant—what are the best areas for that?", a good starting point would be to look at where the top upscale Korean restaurants are located and analyse the competitive and demographic factors contributing to their success.

In our architecture, this is done by converting the user query into a vector using an embedding model from the Databricks Foundation Model API. The vector is then sent to a Mosaic AI Vector Search Index, which retrieves a "hex" representing the best existing restaurant that matches the user’s query. In the example, this would point to the location of a top upscale Korean restaurant in the city. In this case, we look up a single restaurant that best matches the restaurant we are attempting to open, and this acts as a proxy for what an ideal area could be. The source table for the vector search index consists of comprehensive restaurant data containing details like name, location, cuisine type, ratings, price range, and popularity.

Generating High-Potential Location Options

Once a comparable hex location is identified, the next step is to find other areas with similar competitive and demographic characteristics. To do this, a K-nearest neighbour (K-NN) model was used, which uses a range of features, including restaurant statistics like the total number of restaurants, types of cuisines, and demographic data, such as population, income levels, and consumer preferences. By combining this data, the model identifies three high-potential location options that are most likely to support the success of the restaurant concept the user is interested in. This step ensures that restaurant owners are presented with options tailored to their specific needs, providing a strategic starting point for choosing the best area to open their new venture.

Adding Context with Specialized Tools

The three potential hex location options are enhanced with additional insights about the competitive landscape and demographic details specific to the restaurant concept in the user query. To achieve this, two specialized tools were set up: the Competitor Analysis Tool and the Demographic Analysis Tool.

The Competitor Tool provides valuable context for each hex, including information on similar restaurants, their ratings, price range, and popularity.
The Demographic Tool adds depth to each hex’s context by providing detailed demographic data for the area.

These tools work in parallel to provide detailed context for each location. Currently, our AI system calls these tools deterministically. However, our framework is designed to support the development of an agent-based system, where these tools can be part of a more dynamic function-call process, coordinated by an orchestrating AI agent.

Generating Recommendations

Once each of the three hex locations is enriched with specific context, the data is then passed to a Summarization Agent. This agent is tasked with generating a recommendation for each location, advising whether opening a restaurant is a viable option or not. The user is also provided with a detailed explanation for each recommendation, including a SWOT analysis (Strengths, Weaknesses, Opportunities, and Threats) for each hex location. This helps the user better understand the rationale behind the recommendations, offering deeper insights into why certain locations are more suitable than others.

Screenshot of the application

Optimizing the Architecture

When optimizing the architecture, it’s important to consider key performance indicators that you are targeting. For instance, if real-time latency is the priority, that may take precedence over accuracy. In our case, the architecture evolved through multiple iterations to strike the right balance between accuracy, performance, and user experience.

Initially, we used three summarizing agents—one for each tool and a final one to combine recommendations—assuming it would yield more thorough responses. However, simplifying to one final agent not only provided comparable results but also reduced latency by 30%. Moreover, the multi-agent approach introduced a higher risk of hallucinations, occasionally generating irrelevant hex locations. Simplifying the architecture not only improved performance but also enhanced overall reliability.

The current architecture utilizes an AI agent system that makes parallel calls to gather competitive and demographic data. A final call to a Large Language Model (LLM) then synthesizes these insights into a detailed recommendation. The sequence in which the components of our AI system are invoked is fixed, which we found to consistently produce reliable results. Alternatively, an AI system could dynamically decide which tools to call and in what order, optimizing the process by skipping unnecessary steps and improving latency.

Conclusion

Combining geospatial tools with large language models (LLMs) unlocks powerful, geographically aware solutions that can support your business use cases. By leveraging native spatial SQL functions, advanced indexing with Mosaic AI Vector Search, and seamless access to foundation models, Databricks empowers organizations to build agentic workflows that deliver precise, content-rich, and geographically targeted recommendations. This is particularly valuable for applications such as real estate investment, risk assessment, and asset management, where understanding the spatial context is crucial for informed decision-making.

By leveraging technologies like H3 indexing and Delta Lake, Databricks can efficiently process and organize large geospatial datasets. This enables scalable and high-performance solutions for complex spatial applications. When integrated with Retrieval-Augmented Generation (RAG) pipelines, LLMs can generate grounded, location-aware insights using up-to-date spatial data—while minimizing compute load and reducing operational costs. Proven across real-world use cases, this approach accelerates innovation and drives meaningful business outcomes through smarter, data-driven decision-making.

About Aimpoint Digital

Aimpoint Digital is a market-leading analytics firm at the forefront of solving the most complex business and economic challenges through data and analytical technology. From integrating self-service analytics to implementing AI at scale and modernizing data infrastructure environments, Aimpoint Digital operates across transformative domains to improve the performance of organizations. Learn more by visiting: https://www.aimpointdigital.com/