cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
DavidHurley
Databricks Employee
Databricks Employee

Turning GPS Pings into Insights: Mobile Fleet Analytics with Databricks Spatial SQL

Energy companies manage large and complex fleets of mobile equipment - from service trucks in a utility company, to haul trucks at a mining site, to drill rigs in an oil patch. Managing equipment at this scale can be challenging, if equipment becomes congested, operates too close together, or sits underutilized, inefficiencies ripple through the operation.

In this blog, we’ll walk through a low-effort way to analyze, visualize, and share trends about on-site mobile fleet assets using Databricks AI/BI Dashboards and Databricks Spatial SQL. With these insights, operations teams can begin to move from reactionary decisions to proactive ones.

The Problem

Picture a day in the life of mobile fleet operations at a large mining site. The operations team’s job is to dispatch equipment where it’s needed, minimize delays, and avoid scenarios that could slow production, or worse create safety risks.

To do this, they rely on low-latency, near real-time location data to make quick decisions. That might mean radioing a haul truck operator to take an alternate route, or asking light-duty trucks to pull over so a haul truck can pass safely. Because of these near real-time demands, GPS signals from on-site equipment are typically streamed directly to on-premise databases. From there, operations software can visualize both current and recent equipment locations.

The problem is that while visualizing real-time location data is essential for coordination, it’s fundamentally reactionary. By the time operations realizes there is congestion, production may already be affected. And when operations are forced into constant reaction mode, safety incidents and near-misses become more likely.

The Databricks Solution

Instead of making reactionary decisions, what if operations could act proactively?

  • If dispatchers knew equipment congestion usually builds between 7-9AM in a certain area, they can proactively re-route trucks ahead of time and avoid the slowdown altogether.

  • If operations notices equipment consistently getting too close - since most industrial sites require a minimum safe distance - they can investigate the cause and address it proactively, preventing potential collisions before they occur.

Here, we focus on building a proactive approach to mobile fleet management using Databricks Spatial SQL and Databricks Dashboards (see image below).

  • Databricks AI/BI Dashboards serve as the visualization layer and can be shared with operations in multiple ways - daily email snapshots, embedded SharePoint pages, or directly as interactive dashboards.

  • Databricks Spatial SQL provides the geospatial functions needed to transform raw GPS pings into actionable insights.

Screenshot 2025-10-03 at 3.26.52 PM.png

Demo: Databricks Dashboards and Spatial SQL

The Datasets

For this demo, we’re working with two tabular datasets stored as Delta tables and registered in Unity Catalog.

  • equipment_gps_data - high-frequency taxi location data from Porto, Portugal, with pings every 15 seconds. Since open-source GPS data for on-site equipment isn’t available, we’re using taxis as a stand-in. The analysis and visualizations are identical for any GPS dataset.
  • congestion_zone_polygon_data - a synthetic dataset that defines polygon boundaries for areas with known congestion issues.

image2.png

image3.png

 

The Databricks Spatial SQL Functions

The rest of this blog references several spatial SQL functions that are used to turn GPS pings and polygon boundaries into Databricks spatial types and derive insights. 

We are highlighting the functionality of a few of these.

  • st_geomfromwkt(): Converts a string representation of a geometry into a valid Databricks GEOMETRY object
  • st_point(): Converts a numeric coordinate pair into a valid Databricks GEOMETRY point
  • st_transform(): Converts a GEOMETRY object from one coordinate system to another
  • st_buffer(): Creates a polygon buffer around a GEOMETRY object at a fixed distance
  • st_contains(): Checks if one GEOMETRY object completely contains another
  • st_intersects(): Checks if two GEOMETRY objects overlap or touch

 

Step 1: Creating Databricks Native GEOMETRY Types

Both datasets contain longitude and latitude coordinates in Spatial Reference Identifier (SRID) 4326, which represents positions on the Earth in degrees (WGS 84). Because degrees are not evenly spaced, this system isn’t ideal for spatial operations like measuring distances or checking proximity. For those tasks, it’s better to transform the data into a projected coordinate system in meters. 

We can accomplish this using st_transform(), which converts the longitude and latitude data from the current coordinate reference system (CRS) into a CRS defined by a provided SRID and stores the result as a Databricks GEOMETRY type. For this demo, we create two new Delta tables in Unity Catalog, equipment_gps_geom_data and congestion_zone_polygon_geom_data. These tables contain the original location data transformed to SRID 3763 - the standard projection for Portugal, with coordinates in meters.

These two tables will be the source for spatial operations and visualization for the remainder of this blog.   

CREATE TABLE <catalog>.<schema>.equipment_gps_geom_data AS
SELECT
     *,
     st_transform(st_point(longitude, latitude, 4326), 3763) as point_geom
FROM <catalog>.<schema>.equipment_gps_data


CREATE TABLE <catalog>.<schema>.congestion_zone_polygon_geom_data AS
SELECT
   *,
   st_transform(st_geomfromwkt(congestion_zone_polygon, 4326), 3763) AS polygon_geom
FROM <catalog>.<schema>.congestion_zone_polygon_data

image4.png

image5.png

 

Step 2 (Optional): Visualizing Historical Equipment Location 

Although we don’t need to visualize historical equipment locations alongside congestion zone boundaries, doing so helps us understand the results of subsequent spatial operations. To do this, we use a Databricks Dashboard point map, which displays longitude and latitude coordinates in SRID 4326 (degrees).

Because point maps currently support only a single dataset, we need to combine the equipment location and congestion zone boundary datasets. Additionally, since point maps display numeric point data and our congestion zone polygons are sparse GEOMETRY objects, we need to parse our polygon objects to numeric points and increase the point density so the boundaries appear as solid lines.

To achieve this, we first create a User Defined Function (UDF) called densify_polygon_boundary(). This UDF takes the corner points of a polygon’s bounding box and generates additional points along its perimeter at a specified interval in meters.

We then define a new dataset in the dashboard called Equipment Location and Congestion Zone Perimeter Data. In this dataset:

  1. We get the corner coordinates, which define the bounding box perimeter, for each congestion zone polygon
  2. We apply the densify_polygon_boundary() UDF to generate additional points along each bounding box perimeter.
  3. We reproject these points from meters in SRID 3763 to degrees in SRID 4326 using st_transform(), so we can plot on our point map, and parse the longitude and latitude from the result into numeric points using st_point(), st_x() and st_y().
  4. Finally, we union the densified polygon points with the equipment location data, filling missing columns with NULL to ensure the union works correctly.

The resulting dataset is added to the point map, with points colored by equipment_id and a tooltip for easy inspection. Congestion zones appear as grey rectangular boxes because and have NULL for equipment_id.

-- Densify Points Along a Polygon Bounds 
%sql
CREATE OR REPLACE FUNCTION <catalog>.<schema>.densify_polygon_boundary(
   xmin DOUBLE, xmax DOUBLE, ymin DOUBLE, ymax DOUBLE, meter_spacing FLOAT
)
RETURNS ARRAY<STRUCT<x DOUBLE, y DOUBLE>>
LANGUAGE PYTHON
AS $$
from pyspark.sql import Row

def densify_boundary(xmin, xmax, ymin, ymax, meter_spacing):
   points = []

   x = xmin
   while x <= xmax:
       points.append(Row(x=x, y=ymin))
       x += meter_spacing

   y = ymin + meter_spacing
   while y <= ymax:
       points.append(Row(x=xmax, y=y))
       y += meter_spacing

   x = xmax - meter_spacing
   while x >= xmin:
       points.append(Row(x=x, y=ymax))
       x -= meter_spacing

   y = ymax - meter_spacing
   while y > ymin:
       points.append(Row(x=xmin, y=y))
       y -= meter_spacing

   return points

return densify_boundary(xmin, xmax, ymin, ymax, meter_spacing)
$$;
-- Equipment Location and Congestion Zone Perimeter Data
WITH polygon_boundaries_min_max AS (
   SELECT
       congestion_zone_id,
       st_xmin(congestion_zone_polygon_geom) AS xmin,
       st_xmax(congestion_zone_polygon_geom) AS xmax,
       st_ymin(congestion_zone_polygon_geom) AS ymin,
       st_ymax(congestion_zone_polygon_geom) AS ymax
   FROM
       <catalog>.<schema>.congestion_zone_polygon_geom_data
)
SELECT
   t1.route_id,
   t1.equipment_id,
   t1.timestamp,
   t1.longitude,
   t1.latitude,
   NULL as congestion_zone_id
FROM
   <catalog>.<schema>.equipment_gps_geom_data as t1
UNION ALL
SELECT
   NULL AS route_id,
   NULL AS equipment_id,
   NULL AS timestamp,
   st_x(st_transform(st_point(p.x, p.y, 3763), 4326)) AS longitude,
   st_y(st_transform(st_point(p.x, p.y, 3763), 4326)) AS latitude,    
   congestion_zone_id
FROM polygon_boundaries_min_max as t2
LATERAL VIEW EXPLODE(
   <catalog>.<schema>.densify_polygon_boundary(t2.xmin, t2.xmax, t2.ymin, t2.ymax, 5)
) AS p;

image6.png

image7.png

 

Step 3: Turning GPS Pings

While our visual in Step 2 is useful, it doesn’t help site operations move from reactionary to proactive decision-making. To get there, we’ll create two new datasets in our Dashboard.

  • Equipment In Congested Zone - a binary flag indicating whether an equipment GPS ping falls within a congestion zone (i.e. if a point is within a bounding box in image 1).

  • Equipment Proximity Alert - a binary flag indicating when two pieces of equipment are too close at the same time and the IDs of the equipment involved.

Equipment In Congested Zone

For this dataset, we use st_contains() to check whether a piece of equipment is inside a congestion zone polygon. The function returns a boolean TRUE if the point is within the polygon,  we cast TRUE to 1 and 0 otherwise so that we can later sum and create a proxy for time spent in a congestion zone by each piece of equipment. Lastly, since there are multiple congestion zone boundaries, we perform a cross join with all equipment GPS pings to ensure every point is evaluated against every polygon.

-- Equipment In Congested Zone
SELECT
 t1.*,
 t2.congestion_zone_id,
 CAST(ST_Contains(polygon_geom, point_geom) AS INT) AS is_equipment_in_congestion_zone
FROM
 <catalog>.<schema>.equipment_gps_geom_data AS t1
CROSS JOIN
 <catalog>.<schema>.congestion_zone_polygon_geom_data AS t2

image8.png

Equipment Proximity Alert

For this dataset, we detect when two pieces of equipment are too close at the same timestamp.

First, we create a 5m buffer around each equipment GPS point using st_buffer(). Since our data is already stored as a GEOMETRY type in SRID 3763, we can directly work with distances in meters, so no reprojection is needed.

Next, we join all equipment points at the same timestamp and use st_intersects() to check whether the buffered areas overlap.

Lastly, we filter to keep only results where an intersection occurs and collect the id’s of all equipment that came too close, producing a clear proximity alert dataset.

-- Equipment Proximity Alert
SELECT
  c.timestamp,
  c.equipment_id,
  CASE WHEN COUNT_IF(ST_Intersects(c.geom, p.geom)) > 0 THEN 1 ELSE 0 END AS is_equipment_to_close,
  FILTER(COLLECT_LIST(p.equipment_id), x -> x != c.equipment_id) AS close_equipment_ids
FROM (
  SELECT
      equipment_id,
      timestamp,
      ST_Buffer(point_geom, 5) AS geom
  FROM <catalog>.<schema>.equipment_gps_geom_data
) c
LEFT JOIN (
  SELECT
      equipment_id,
      timestamp,
      ST_Buffer(point_geom, 5) AS geom
  FROM <catalog>.<schema>.equipment_gps_geom_data
) p
ON c.timestamp = p.timestamp
AND c.equipment_id != p.equipment_id
GROUP BY c.timestamp, c.equipment_id
HAVING COUNT_IF(ST_Intersects(c.geom, p.geom)) > 0

image9.png

 

Step 4: Enhancing our Dashboard Visualizations

Now that we have datasets for equipment proximity and whether a piece of equipment was in a congestion area, we can visualize them in several ways.

For this demo, we’ll create two visualizations - a pivot visualization, showing total time spent in congestion zones, and a table, showing equipment that was too close..

For the pivot, we choose the pivot table visualization type, set rows to equipment_id, and values to the sum of is_equipment_in_congestion_zone. This acts as a proxy for the total time each piece of equipment spends in congested areas. To make it easier to read, we order the table by time spent and apply a color scale.

For the proximity table, we select Table, and show only timestamp, equipment_id, if a piece of equipment was too close to another and what piece of equipment that was. 

 

Step 5: Operationalize the Dashboard

Once the dashboard is published and a refresh schedule is set, users can also refresh it on-demand. The next step is to share it with mobile fleet operations so proactive decision-making is better supported. There are several options for sharing:

  • Share the dashboard link directly with operators. Users must be added to the Databricks workspace. The recommended approach is to grant “consumer access” entitlement, which allows them to view the shared dashboard through the new Databricks One UI.

  • Add user emails in the subscriber section. This sends an email with a snapshot and a link to the dashboard according to the refresh schedule. This method is ideal for automated morning updates to site operations.

  • Embed the dashboard in a SharePoint page. Many site operations teams use SharePoint, so embedding the dashboard in a “Mobile Fleet Analytics” page is an easy way to share it. The refresh schedule determines the data displayed on the SharePoint page.

 

Conclusion

We now have a dashboard that can support proactive decisions around managing equipment congestion and proximity on-site. For example, operations might investigate why a certain piece of equipment spends an unusually long time in a congestion zone or why some equipment is getting too close.