cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
josh_melton
Databricks Employee
Databricks Employee

Large logistics firms routinely make tens of millions of deliveries a day. In our demo, we model a modest subset of 40,000 packages, but even that pushes the limits of brute-force analysis. To see why, note that there are about 10²⁰ grains of sand on Earth and roughly 10⁸⁰ atoms in the observable universe, yet the number of distinct ways to sequence 40,000 deliveries is on the order of 10^166 000. Even a hypothetical computer evaluating a trillion delivery routes every nanosecond for the entire 13.8-billion-year age of the universe would examine barely 10^39 possibilities—leaving nearly our entire search space still unexplored. Problems like this one whose search space grows as n! explode so rapidly that even polynomial-time algorithms (n, n², n³, …) look trivial by comparison. 

josh_melton_0-1753227476251.png

Although a 40,000-stop problem sounds impossible, layered heuristics tame the blast radius. We first partition the stops into geographically cohesive clusters with k-means, so each cluster seeds a separate “mini-route.” That single step collapses the factorial search space by orders of magnitude. Inside each cluster, routing engines mimic a human’s instinct to link nearby points and skip zig-zags across the map: algorithms such as local search, savings, and tabu—all available in OR-Tools and Gurobi—prune unlikely tours without sacrificing quality. Even after this culling, millions of promising possibilities remain, and that’s where scale matters. Fanned out across a five-node Ray on Databricks cluster, we solved the 40,000-stop example in our demo in about 15 minutes for just a few dollars. Need it faster or for a larger fleet? Add nodes and watch the wall-clock time fall almost linearly.

Clustering by lat/long coordinates creates more tractable routing sub-problems:

josh_melton_1-1753227476318.png

 

We have one final hurdle: the cost of distance lookups can dwarf the cost of the routing logic itself. A hosted mapping service charges per request, so even a modest workload—100 routes with 100 stops each (≈ 10,000 distance queries)—can rack up thousands of dollars a day. For the scale of a global logistics fleet delivering millions of packages per day, that bill grows by orders of magnitude. In our demo we sidestep those costs by running an Open Source Routing Machine (OSRM) server inside the same Spark/Ray cluster that executes the optimization. Spinning up the container adds a one-time 20-minute load, but once it’s live every additional lookup is served locally at zero marginal cost. As query volume climbs, the in-cluster OSRM approach becomes both cheaper and faster than any pay-per-request API—and the economics improve with every package you add.

The final result of an optimized route displayed in a Databricks App:

josh_melton_2-1753227476359.png

In short, this approach to route optimization is a three-legged stool: heuristics to slash the search space, elastic compute to crunch what remains, and local distance services to kill per-request fees. Together they transform a problem that is theoretically intractable—40,000! possible sequences—into a practical, dollars-and-minutes workflow. With OR-Tools or Gurobi pruning unlikely tours, a five-node Spark-on-Ray cluster exploring the survivors, and an in-cluster OSRM server supplying lightning-fast distance lookups, we generated high-quality routes for 40,000 stops in a quarter of an hour for pocket change. Scale that recipe up or down—add nodes for bigger fleets, swap heuristics for tighter constraints, refresh the map data overnight—and you have a repeatable blueprint for pushing millions of daily deliveries through the narrow window of each driver’s shift while keeping both compute bills and fuel costs in check. The result isn’t just faster math; it’s improving margins with every package. Try it out here!

1 Comment