Energy companies manage large and complex fleets of mobile equipment - from service trucks in a utility company, to haul trucks at a mining site, to drill rigs in an oil patch. Managing equipment at this scale can be challenging, if equipment becomes congested, operates too close together, or sits underutilized, inefficiencies ripple through the operation.
In this blog, we’ll walk through a low-effort way to analyze, visualize, and share trends about on-site mobile fleet assets using Databricks AI/BI Dashboards and Databricks Spatial SQL. With these insights, operations teams can begin to move from reactionary decisions to proactive ones.
Picture a day in the life of mobile fleet operations at a large mining site. The operations team’s job is to dispatch equipment where it’s needed, minimize delays, and avoid scenarios that could slow production, or worse create safety risks.
To do this, they rely on low-latency, near real-time location data to make quick decisions. That might mean radioing a haul truck operator to take an alternate route, or asking light-duty trucks to pull over so a haul truck can pass safely. Because of these near real-time demands, GPS signals from on-site equipment are typically streamed directly to on-premise databases. From there, operations software can visualize both current and recent equipment locations.
The problem is that while visualizing real-time location data is essential for coordination, it’s fundamentally reactionary. By the time operations realizes there is congestion, production may already be affected. And when operations are forced into constant reaction mode, safety incidents and near-misses become more likely.
Instead of making reactionary decisions, what if operations could act proactively?
If dispatchers knew equipment congestion usually builds between 7-9AM in a certain area, they can proactively re-route trucks ahead of time and avoid the slowdown altogether.
If operations notices equipment consistently getting too close - since most industrial sites require a minimum safe distance - they can investigate the cause and address it proactively, preventing potential collisions before they occur.
Here, we focus on building a proactive approach to mobile fleet management using Databricks Spatial SQL and Databricks Dashboards (see image below).
Databricks AI/BI Dashboards serve as the visualization layer and can be shared with operations in multiple ways - daily email snapshots, embedded SharePoint pages, or directly as interactive dashboards.
Databricks Spatial SQL provides the geospatial functions needed to transform raw GPS pings into actionable insights.
For this demo, we’re working with two tabular datasets stored as Delta tables and registered in Unity Catalog.
The rest of this blog references several spatial SQL functions that are used to turn GPS pings and polygon boundaries into Databricks spatial types and derive insights.
We are highlighting the functionality of a few of these.
Both datasets contain longitude and latitude coordinates in Spatial Reference Identifier (SRID) 4326, which represents positions on the Earth in degrees (WGS 84). Because degrees are not evenly spaced, this system isn’t ideal for spatial operations like measuring distances or checking proximity. For those tasks, it’s better to transform the data into a projected coordinate system in meters.
We can accomplish this using st_transform(), which converts the longitude and latitude data from the current coordinate reference system (CRS) into a CRS defined by a provided SRID and stores the result as a Databricks GEOMETRY type. For this demo, we create two new Delta tables in Unity Catalog, equipment_gps_geom_data and congestion_zone_polygon_geom_data. These tables contain the original location data transformed to SRID 3763 - the standard projection for Portugal, with coordinates in meters.
These two tables will be the source for spatial operations and visualization for the remainder of this blog.
CREATE TABLE <catalog>.<schema>.equipment_gps_geom_data AS
SELECT
*,
st_transform(st_point(longitude, latitude, 4326), 3763) as point_geom
FROM <catalog>.<schema>.equipment_gps_data
CREATE TABLE <catalog>.<schema>.congestion_zone_polygon_geom_data AS
SELECT
*,
st_transform(st_geomfromwkt(congestion_zone_polygon, 4326), 3763) AS polygon_geom
FROM <catalog>.<schema>.congestion_zone_polygon_data
Although we don’t need to visualize historical equipment locations alongside congestion zone boundaries, doing so helps us understand the results of subsequent spatial operations. To do this, we use a Databricks Dashboard point map, which displays longitude and latitude coordinates in SRID 4326 (degrees).
Because point maps currently support only a single dataset, we need to combine the equipment location and congestion zone boundary datasets. Additionally, since point maps display numeric point data and our congestion zone polygons are sparse GEOMETRY objects, we need to parse our polygon objects to numeric points and increase the point density so the boundaries appear as solid lines.
To achieve this, we first create a User Defined Function (UDF) called densify_polygon_boundary(). This UDF takes the corner points of a polygon’s bounding box and generates additional points along its perimeter at a specified interval in meters.
We then define a new dataset in the dashboard called Equipment Location and Congestion Zone Perimeter Data. In this dataset:
The resulting dataset is added to the point map, with points colored by equipment_id and a tooltip for easy inspection. Congestion zones appear as grey rectangular boxes because and have NULL for equipment_id.
-- Densify Points Along a Polygon Bounds
%sql
CREATE OR REPLACE FUNCTION <catalog>.<schema>.densify_polygon_boundary(
xmin DOUBLE, xmax DOUBLE, ymin DOUBLE, ymax DOUBLE, meter_spacing FLOAT
)
RETURNS ARRAY<STRUCT<x DOUBLE, y DOUBLE>>
LANGUAGE PYTHON
AS $$
from pyspark.sql import Row
def densify_boundary(xmin, xmax, ymin, ymax, meter_spacing):
points = []
x = xmin
while x <= xmax:
points.append(Row(x=x, y=ymin))
x += meter_spacing
y = ymin + meter_spacing
while y <= ymax:
points.append(Row(x=xmax, y=y))
y += meter_spacing
x = xmax - meter_spacing
while x >= xmin:
points.append(Row(x=x, y=ymax))
x -= meter_spacing
y = ymax - meter_spacing
while y > ymin:
points.append(Row(x=xmin, y=y))
y -= meter_spacing
return points
return densify_boundary(xmin, xmax, ymin, ymax, meter_spacing)
$$;
-- Equipment Location and Congestion Zone Perimeter Data
WITH polygon_boundaries_min_max AS (
SELECT
congestion_zone_id,
st_xmin(congestion_zone_polygon_geom) AS xmin,
st_xmax(congestion_zone_polygon_geom) AS xmax,
st_ymin(congestion_zone_polygon_geom) AS ymin,
st_ymax(congestion_zone_polygon_geom) AS ymax
FROM
<catalog>.<schema>.congestion_zone_polygon_geom_data
)
SELECT
t1.route_id,
t1.equipment_id,
t1.timestamp,
t1.longitude,
t1.latitude,
NULL as congestion_zone_id
FROM
<catalog>.<schema>.equipment_gps_geom_data as t1
UNION ALL
SELECT
NULL AS route_id,
NULL AS equipment_id,
NULL AS timestamp,
st_x(st_transform(st_point(p.x, p.y, 3763), 4326)) AS longitude,
st_y(st_transform(st_point(p.x, p.y, 3763), 4326)) AS latitude,
congestion_zone_id
FROM polygon_boundaries_min_max as t2
LATERAL VIEW EXPLODE(
<catalog>.<schema>.densify_polygon_boundary(t2.xmin, t2.xmax, t2.ymin, t2.ymax, 5)
) AS p;
While our visual in Step 2 is useful, it doesn’t help site operations move from reactionary to proactive decision-making. To get there, we’ll create two new datasets in our Dashboard.
Equipment In Congested Zone - a binary flag indicating whether an equipment GPS ping falls within a congestion zone (i.e. if a point is within a bounding box in image 1).
Equipment Proximity Alert - a binary flag indicating when two pieces of equipment are too close at the same time and the IDs of the equipment involved.
Equipment In Congested Zone
For this dataset, we use st_contains() to check whether a piece of equipment is inside a congestion zone polygon. The function returns a boolean TRUE if the point is within the polygon, we cast TRUE to 1 and 0 otherwise so that we can later sum and create a proxy for time spent in a congestion zone by each piece of equipment. Lastly, since there are multiple congestion zone boundaries, we perform a cross join with all equipment GPS pings to ensure every point is evaluated against every polygon.
-- Equipment In Congested Zone
SELECT
t1.*,
t2.congestion_zone_id,
CAST(ST_Contains(polygon_geom, point_geom) AS INT) AS is_equipment_in_congestion_zone
FROM
<catalog>.<schema>.equipment_gps_geom_data AS t1
CROSS JOIN
<catalog>.<schema>.congestion_zone_polygon_geom_data AS t2
Equipment Proximity Alert
For this dataset, we detect when two pieces of equipment are too close at the same timestamp.
First, we create a 5m buffer around each equipment GPS point using st_buffer(). Since our data is already stored as a GEOMETRY type in SRID 3763, we can directly work with distances in meters, so no reprojection is needed.
Next, we join all equipment points at the same timestamp and use st_intersects() to check whether the buffered areas overlap.
Lastly, we filter to keep only results where an intersection occurs and collect the id’s of all equipment that came too close, producing a clear proximity alert dataset.
-- Equipment Proximity Alert
SELECT
c.timestamp,
c.equipment_id,
CASE WHEN COUNT_IF(ST_Intersects(c.geom, p.geom)) > 0 THEN 1 ELSE 0 END AS is_equipment_to_close,
FILTER(COLLECT_LIST(p.equipment_id), x -> x != c.equipment_id) AS close_equipment_ids
FROM (
SELECT
equipment_id,
timestamp,
ST_Buffer(point_geom, 5) AS geom
FROM <catalog>.<schema>.equipment_gps_geom_data
) c
LEFT JOIN (
SELECT
equipment_id,
timestamp,
ST_Buffer(point_geom, 5) AS geom
FROM <catalog>.<schema>.equipment_gps_geom_data
) p
ON c.timestamp = p.timestamp
AND c.equipment_id != p.equipment_id
GROUP BY c.timestamp, c.equipment_id
HAVING COUNT_IF(ST_Intersects(c.geom, p.geom)) > 0
Now that we have datasets for equipment proximity and whether a piece of equipment was in a congestion area, we can visualize them in several ways.
For this demo, we’ll create two visualizations - a pivot visualization, showing total time spent in congestion zones, and a table, showing equipment that was too close..
For the pivot, we choose the pivot table visualization type, set rows to equipment_id, and values to the sum of is_equipment_in_congestion_zone. This acts as a proxy for the total time each piece of equipment spends in congested areas. To make it easier to read, we order the table by time spent and apply a color scale.
For the proximity table, we select Table, and show only timestamp, equipment_id, if a piece of equipment was too close to another and what piece of equipment that was.
Once the dashboard is published and a refresh schedule is set, users can also refresh it on-demand. The next step is to share it with mobile fleet operations so proactive decision-making is better supported. There are several options for sharing:
We now have a dashboard that can support proactive decisions around managing equipment congestion and proximity on-site. For example, operations might investigate why a certain piece of equipment spends an unusually long time in a congestion zone or why some equipment is getting too close.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.