So I’ve been trying to think of some examples that make use of the Garmin POI data from my last post. Seeing as POI data naturally lends itself to nearest-neighbour queries, I thought I’d write a little about those.
The general form of the nearest neighbor query is: “Show me the closest x restaurants/pubs/train stations to this location”; this is the query pattern used by location-aware applications to find POIs close to the user’s current location, for example. However, another pattern of which I’ve seen much fewer examples is how to update an entire table of records in order to determine and populate the nearest neighbour for each row in the table (indeed, this very question came up on the MSDN spatial forum just last week).
With this in mind, I thought I’d demonstrate a practical scenario: let’s say you’re planning going for a swim at the local pool and, after you’ve swum 50 lengths, you’ll probably feel a bit peckish – perhaps in the need for some fish ‘n’ chips. So, in this example I’m firstly going to import the Garmin POI data for swimming pools and fish ‘n’ chip shops in the UK, each being represented by a geography Point instance. I’ll then add an extra column to the swimming pool table and populate it with the closest fish and chip shop to every swimming pool.
To start with, create tables containing the data for fish and chip shops and swimming pools (download the data from here). Notice that I’ve included an IDENTITY primary key field in each table, which I’ll need in order to add a spatial index later:
SELECT Id = IDENTITY(int,1,1), CAST(Name AS varchar(255)) AS Name, CAST(Details AS varchar(255)) AS Details, geography::Point(Latitude,Longitude,4326) AS Location INTO SwimmingPools FROM OPENROWSET( BULK 'C:\Users\Alastair\Downloads\GarminPOI\Swimming Pools.csv', FORMATFILE='C:\temp\garminpoi.xml', FIRSTROW=2 ) AS SwimmingPools; GO ALTER TABLE SwimmingPools ADD PRIMARY KEY(Id); GO SELECT Id = IDENTITY(int,1,1), CAST(Name AS varchar(255)) AS Name, CAST(Details AS varchar(255)) AS Details, geography::Point(Latitude,Longitude,4326) AS Location INTO FishAndChips FROM OPENROWSET( BULK 'C:\Users\Alastair\Downloads\GarminPOI\Fish and Chips.csv', FORMATFILE='C:\temp\garminpoi.xml', FIRSTROW=2 ) AS FishAndChips; GO ALTER TABLE FishAndChips ADD PRIMARY KEY(Id); GO
Now we’ll add some extra columns to the Swimming Pools table, which we’ll populate with information about the closest fish ‘n’ chip shop:
ALTER TABLE SwimmingPools ADD ClosestFishbar varchar(255), Distance decimal(18,2);
Now, you don’t need to add a spatial index to perform the update query, but it’ll be a lot faster with one than without. SQL Server can only make use of one spatial index in a join between tables, so there’s no need to add an index to both tables. We’ll just add one to the Fish And Chips table, as follows:
CREATE SPATIAL INDEX sidx_FishAndChips ON FishAndChips(Location) USING GEOGRAPHY_GRID WITH ( GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH), CELLS_PER_OBJECT = 16 );
Then, run an UPDATE query to populate the ClosestFishBar and Distance columns, as follows:
UPDATE s SET ClosestFishbar = fName, Distance = Location.STDistance(fLocation) FROM ( SELECT SwimmingPools.*, fnc.Name AS fName, fnc.Location AS fLocation FROM SwimmingPools CROSS APPLY ( SELECT TOP 1 Name, Location FROM FishAndChips WITH(index(sidx_FishAndChips)) WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC ) fnc ) s;
This query probably needs some explanation:
- Generally, you might expect to use a correlated subquery to obtain the closest fish and chip shop to each swimming pool. However, correlated subqueries can only return a single value, and I want to retrieve both the name and the distance to the closest fish bar, so I’ve used a CROSS APPLY instead.
- In the function being applied, I’m sorting the table of fish and chip shops by ascending order of their distance from the current swimming pool (ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC), and then selecting the TOP 1 Name and Location (i.e. the closest).
- To make the query efficient, I’ve also added an index hint to use the spatial index on the Fish and Chips table – WITH(index(sidx_FishAndChips)), and I’ve included an extra predicate, WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL. This extra predicate will help the query optimiser choose the dedicated nearest neighbour query plan introduced in SQL Server 2012 and SQL Azure.
- Note that the spatial index-optimised nearest neighbour query plan is only available in SQL Server 2012/SQL Azure. If you try to execute the UPDATE query as written above in SQL Server 2008/R2 then you’ll get an error:
“The query processor could not produce a query plan for a query with a spatial index hint. Reason: Spatial indexes do not support the comparator supplied in the predicate. Try removing the index hints or removing SET FORCEPLAN.”
As stated, to resolve this you’ll have to remove the WITH(index(sidx_FishAndChips)) index hint (and also expect your query to run a lot slower!). Instead, you might need to try one of the alternative approaches to efficiently finding nearest neighbours in SQL Server 2008/R2.
- Finally, in order to perform the UPDATE, I’ve wrapped the whole lot in a table alias, s, and then updated the two columns as required.
This query should take a few seconds to run (or, if you don’t create/use the spatial index, a few minutes), after which you can check the results as follows:
SELECT * FROM SwimmingPools;
And there you have the complete table of swimming pools, each listed with the name and distance to its closest fish and chip shop:
Seeing as it’s less than 100 metres from getting out of the water to a bag of chips, I’ll think I’ll be taking my next dip at the Farnworth swimming pool….