Nearest neighbour queries (or, for my international readers, nearest-neighbor queries), are a pretty common query pattern used in location-based applications and spatial analysis:- given a set of known locations of restaurants, customers, stations etc., “What/where is the nearest X to a given location?” I’ve already written about performing nearest-neighbour queries with SQL Server spatial datatypes here and here.
Occasionally, you’ll see this pattern reversed – “What is the X furthest away from my current location?” – which could be referred to as a farthest-neighbour query.
However, there’s another variant that I’ve not often seen discussed, which is: “Given a set of known locations, where is the point on Earth that maximises the distance to the closest X.” In other words, where would you have to go to get as far away as possible from any item in the given set of data? The key difference to appreciate about this query structure is that, unlike the first two examples, the answer is not selected from one of a set of known locations – it is location that must be determined programmatically to maximise the distance from X. This makes the query a lot more complicated, but also more interesting. (I don’t know if there is a generic name for this type of query or not… a “point-of-maximal distance” query?).
I can think of several potential practical applications for this type of query – a fire response service may want to identify the location at increased risk because it is furthest away from any of their response stations, for example. Or, a mobile telecommunications operator might want to identify the “darkest spot” in their network coverage – the point that lies furthest from any of their transmitters.
As ever, I think a practical example might help demonstrate. One of my recent posts described how to import a set of point data representing Tescos stores in the UK. People that know me will know that I’m a Green, tree-hugging, anti-capitalist hippie, so let’s suppose that I wanted to move house to get as far away from Tescos as possible (for my international readers, Tesco is a multi-billion pound supermarket chain in the UK often presented as one of the faces of modern global capitalism – I think you have similar companies like Wal-Mart or Subway).
I’m going to set an additional constraint that, while I want to live far from Tescos, I still want to live in mainland UK. So, my problem is to find the point that lies somewhere inside this Polygon that is furthest away from any of the dots (which are the locations of Tescos stores):
So…. how do that?
Method 1: Iterative T-SQL
To start, I’m going to create the shape of the UK as a geography MultiPolygon called @UK. I’ve already got a table containing various parts of the UK that I can aggregate, but I want to exclude the Scottish Islands and other small polygon geometries that will confuse matters (Scottish islands = good for single malt whiskey, bad for spatial data… they’re just too craggy!). So, I’ll create a simple numbers table in a CTE and join to that table in order to loop through each element in the UK table and only aggregate those polygons that are greater than 10,000km2 in area:
DECLARE @UK geography;
-- Create a MultiPolygon of the UK, omitting any islands
-- with area less than 10,000km2
WITH Numbers(n) AS (
SELECT ROW_NUMBER() OVER (ORDER BY number) FROM master..spt_values
SELECT @UK = geography::UnionAggregate(Shape.STGeometryN(n))
FROM UK JOIN Numbers ON Numbers.n 10000000000;
I’ll also create a geography MultiPoint of all Tescos stores called @Tescos. The only slight complication here is that the Garmin POI dataset I’m using seems to have some erroneous data, so I’ll put a WHERE condition to filter only those records that lie within the approximate geographic bounds of the UK:
DECLARE @Tescos geography;
-- Create a MultiPoint of Tescos stores
SELECT @Tescos = geography::UnionAggregate(Location)
WHERE Location.Long BETWEEN -10 AND 3
AND Location.Lat BETWEEN 50 AND 60;
Now that we’ve got the basics set up, how do we go about finding the part of @UK that lies furthest from @Tescos? To explain one approach, consider an analogy in which the shape of the UK is cut out of blotting paper. You then arrange a set of pipettes filled with ink, so that each is positioned above the location of a Tescos store. And then, all at the same time, you start dropping ink from the pipettes onto the location of the stores.
The ink blots centred on the location of each store will expand outwards at the same rate, gradually filling up the blotting paper shape of the UK. The last part of the paper to be covered by ink is that which lies furthest from any of the ink sources, and must be the solution.
We can recreate this approach using SQL Server’s spatial functions:
- To mimic the expanding ink blots we can call STBuffer() on the @Tescos MultiPoint with increasing buffer sizes to create expanding circular zones centred around each store.
- To determine the area of blotting paper that is still not covered by ink, we can use STDifference() to compare the inked area to the original shape.
- If the part of the UK yet to be covered by ink is still of a sufficiently large area, we simply add more ink by looping through again with an increased buffer size. If, however, the whole UK has been covered by ink then we’ve put too much ink on and we need to reduce the buffer size slightly.
- When the area yet to be covered reaches a sufficiently small size (this approach is unlikely to ever converge on a single Point), declare that to be the solution.
Here’s the query:
DECLARE @x float = 100000; -- Declare an initial buffer size. I'm guessing that the furthest point in the UK lies more than 100km from Tescos
WHILE @UK.STDifference(@Tescos.STBuffer(@x)).STArea() > 10000 -- Stop when we find an area less than 10km2 in area
-- Print a little debug message to see what's happening on this iteration
PRINT CAST(@UK.STDifference(@Tescos.STBuffer(@x)).STArea() AS varchar(32)) +
'm2 of the UK lies more than ' + CAST(@x AS varchar(32)) + 'm from the closest Tescos';
-- Amount by which to increase the buffer size
DECLARE @y float;
-- Fairly arbitrary, but enlarge the buffer by the square root of the remaining area
-- Intention here is to close in on the appropriate buffer value relatively quickly
SET @y = SQRT(@UK.STDifference(@Tescos.STBuffer(@x)).STArea());
-- Check whether increasing the buffer by this amount would be too great
-- and lead to the whole UK being covered in ink!
WHILE @UK.STDifference(@Tescos.STBuffer(@x + @y)).STArea() = 0 BEGIN
-- Reduce the amount by which the buffer is increased
SET @y = @y/2;
-- Apply the increase in buffer
SET @x = @x + @y;
Once this loop has finished, @x will contain a value, measured in metres, that represents the radius to which each of the ink blots must expand so that the area left uncovered has an area of less than 10km2 . Here’s what the map looks like when a circular buffer of radius @x is created centred on the location of each Tesco store:
UNION ALL SELECT @Tescos
UNION ALL SELECT @UK;
As you can tell, the whole of the UK is pretty much covered – the iteration only stops when the area remaining uncovered is less than 10km2 in in size (WHILE @UK.STDifference(@Tescos.STBuffer(@x)).STArea() > 10000), which means that, at this zoom level, it’s going to be pretty hard to perceive.
To find where the remaining un-inked area is, we can SELECT @UK.STDifference(@Tescos.STBuffer(@x)), and, to make it more obvious, I’ll also add a buffer to enlarge the solution area. And the answer? (as you might have been able to tell from the previous image) is the small blue polygon shown below:
Perhaps unsurprisingly, it’s an area at the tip of the highlands of Scotland, and it’s 119,591 metres from the closest Tesco store. Ok, so it’s not an exact point, but the way that this iteration works it would be very hard to reach such a precise location.
Method 2: SQLCLR Triangulation?
My gut instinct told me that there should be a way of coming up with the exact solution to the problem, in a much simpler manner, using triangulation. For example, suppose you were to create the Delaunay triangulation of the set of points such that the vertices of each triangle were formed from the location of the Tesco stores:
Since the triangles are formed by connecting vertices of Tesco stores, in those areas where stores are further apart, the circumcircle (the point of equal distance from all three vertices) of the triangle created between them will be larger. In fact, you’d think the point that lies at the centre of the triangle of largest circumradius would be the point that lies furthest from any Tesco store. And this would be true, were it not for (quite literally) “edge cases”…
… when, as in this case, the solution lies on the edge of the area in question, it is not covered by the extent of the triangulation (which only creates triangles that are interior to the set of data). Only by artificially extending the triangles out from the set of Tesco stores towards infinity could we ensure that the whole UK had been covered, and it turns out the SQL Server is not too happy about creating infinite-sized triangles (who’d have thought?).
So that’s a bit disappointing… the iterative T-SQL method still works and, with a bit of tweaking to choose how the buffer should be refined on each iteration, could probably be made to work more efficiently. However, I was hoping to be able to get something exact using SQLCLR.
I should just point out that the fact the triangulation method doesn’t extend beyond the bounds of the supplied dataset can still make it useful for lots of other purposes. For example, creating an aggregate union of the triangles creates a concave hull around the location of all Tescos stores in the UK (not a convex hull as could be created by STConvexHull()). The result looks scarily like the shape of the UK itself:
And, deriving a Voronoi tessellation from the set of triangles creates the “catchment area” around each store – the area formed from all those points closest to that store than any other. You can think of this as the materialised form of the results of a nearest-neighbour query:
(Note that, as expected, the point that lies furthest away from a Tescos store is contained within the largest Voronoi polygon – the pinkish one at the top of the map). At first, the Voronoi tessellation and the Delaunay triangulation may appear similar – indeed, they are “dual structures” – while Tesco stores form the vertices of each triangle in the Delaunay triangulation, they are the sites around which each polygon is created in the Voronoi tessellation.
If anybody can suggest any alternative approaches to identify the “point of maximal distance” from a set of data, I’d love to hear it…
Note: The SQLCLR code for creating the Delaunay triangulation, concave hull, and Voronoi tessellation demonstrated in this post is taken from “Pro Spatial with SQL Server 2012”, which also contains a chapter explaining these topics in more detail.