Updating a SQL Server Table with Nearest Neighbours (Optimised for SQL Server 2012 / SQL Azure)

So I’ve been trying to think of some examples that make use of the Garmin POI data from my last post. Seeing as POI data naturally lends itself to nearest-neighbour queries, I thought I’d write a little about those.

The general form of the nearest neighbor query is: “Show me the closest x restaurants/pubs/train stations to this location”; this is the query pattern used by location-aware applications to find POIs close to the user’s current location, for example. However, another pattern of which I’ve seen much fewer examples is how to update an entire table of records in order to determine and populate the nearest neighbour for each row in the table (indeed, this very question came up on the MSDN spatial forum just last week).

With this in mind, I thought I’d demonstrate a practical scenario: let’s say you’re planning going for a swim at the local pool and, after you’ve swum 50 lengths, you’ll probably feel a bit peckish – perhaps in the need for some fish ‘n’ chips. So, in this example I’m firstly going to import the Garmin POI data for swimming pools and fish ‘n’ chip shops in the UK, each being represented by a geography Point instance. I’ll then add an extra column to the swimming pool table and populate it with the closest fish and chip shop to every swimming pool.

To start with, create tables containing the data for fish and chip shops and swimming pools (download the data from here). Notice that I’ve included an IDENTITY primary key field in each table, which I’ll need in order to add a spatial index later:

  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO SwimmingPools
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Swimming Pools.csv',
) AS SwimmingPools;


  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO FishAndChips
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Fish and Chips.csv',
) AS FishAndChips;


Now we’ll add some extra columns to the Swimming Pools table, which we’ll populate with information about the closest fish ‘n’ chip shop:

ALTER TABLE SwimmingPools
ADD ClosestFishbar varchar(255),
    Distance decimal(18,2);

Now, you don’t need to add a spatial index to perform the update query, but it’ll be a lot faster with one than without. SQL Server can only make use of one spatial index in a join between tables, so there’s no need to add an index to both tables. We’ll just add one to the Fish And Chips table, as follows:

CREATE SPATIAL INDEX sidx_FishAndChips ON FishAndChips(Location)

Then, run an UPDATE query to populate the ClosestFishBar and Distance columns, as follows:

SET ClosestFishbar = fName,
    Distance = Location.STDistance(fLocation)  
    fnc.Name AS fName,
    fnc.Location AS fLocation
  FROM SwimmingPools 
    FROM FishAndChips WITH(index(sidx_FishAndChips))
    WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL
    ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC
  ) fnc
) s;

This query probably needs some explanation:

  • Generally, you might expect to use a correlated subquery to obtain the closest fish and chip shop to each swimming pool. However, correlated subqueries can only return a single value, and I want to retrieve both the name and the distance to the closest fish bar, so I’ve used a CROSS APPLY instead.
  • In the function being applied, I’m sorting the table of fish and chip shops by ascending order of their distance from the current swimming pool (ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC), and then selecting the TOP 1 Name and Location (i.e. the closest).
  • To make the query efficient, I’ve also added an index hint to use the spatial index on the Fish and Chips table – WITH(index(sidx_FishAndChips)), and I’ve included an extra predicate, WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL. This extra predicate will help the query optimiser choose the dedicated nearest neighbour query plan introduced in SQL Server 2012 and SQL Azure.
  • Note that the spatial index-optimised nearest neighbour query plan is only available in SQL Server 2012/SQL Azure. If you try to execute the UPDATE query as written above in SQL Server 2008/R2 then you’ll get an error:

    The query processor could not produce a query plan for a query with a spatial index hint.  Reason: Spatial indexes do not support the comparator supplied in the predicate.  Try removing the index hints or removing SET FORCEPLAN.

    As stated, to resolve this you’ll have to remove the WITH(index(sidx_FishAndChips)) index hint (and also expect your query to run a lot slower!). Instead, you might need to try one of the alternative approaches to efficiently finding nearest neighbours in SQL Server 2008/R2.

  • Finally, in order to perform the UPDATE, I’ve wrapped the whole lot in a table alias, s, and then updated the two columns as required.

This query should take a few seconds to run (or, if you don’t create/use the spatial index, a few minutes), after which you can check the results as follows:

SELECT * FROM SwimmingPools;

And there you have the complete table of swimming pools, each listed with the name and distance to its closest fish and chip shop:


Seeing as it’s less than 100 metres from getting out of the water to a bag of chips, I’ll think I’ll be taking my next dip at the Farnworth swimming pool….

This entry was posted in Spatial, SQL Server and tagged , , , . Bookmark the permalink.

6 Responses to Updating a SQL Server Table with Nearest Neighbours (Optimised for SQL Server 2012 / SQL Azure)

  1. Hi alastaira. Wow this is high level SQL Spatial… Really great stuff, congratulations !

    Can you explain furthermore how you choose to create the spatial index this way :

    Why did you choose high level grids and 16 cells per object ?
    Thank you

  2. alastaira says:

    Thanks for the kind comment Xavier! It might be a bit tricky to answer your questions in a comment, but I’ll have a go…
    CELLS_PER_OBJECT of 16 is the default – I could have omitted that line and it would have no effect on the code. It represents the maximum number of grid cells that will be added to the index to describe each geometry. Considering I’m only using Points in this example (rather than LineStrings, or Polygons, say), I could probably have safely reduced this to 1 anyway. The HIGH grid resolutions were chosen because I’m using the geography datatype, and all indexes on the geography datatype have an implicit bounding box the covers the entire globe. However, the data I’m using represents only a pretty small geographic area (the UK). So, I’m using a HIGH resolution grid so that my index is as selective as possible over the limited area in which my points are distributed, and is effective at filtering the records.
    Does that help at all?!

    • Thank you. And yes it is more clear to me. I didn’t remember that 16 was the default for CELLS_PER_OBJECT. I have also always being confused by how “high” was the high grid resolution, and thus if it was suitable for an area like UK or France.

  3. alex says:

    Hi alastaira. Your code is very good (but not so simple for beginers like me : )
    May I ask You haw to do it in VBA ACCESS 2000?
    Thank You for reply.
    Best Regards

  4. Pingback: Another Why Is This Nearest Neighbor Spatial Query So Slow?

  5. Erik Grob says:

    Thank so much for this example. I was about to resort to using a user defined scalar function on each row. For example, when buying house insurance, distance to the nearest fire station is asked, and not everyone is honest. So, given tblHomes and tblFireStations, I wanted to ADD COLUMN to tblHomes with NearestFireStationName, NearestFirestationDistance, and populate them with data from tblFireStations. Both tables have geolocation GEOGRAPHY, and both have spatial indexes. I considered adding a condition or JOIN to help reduce the number of fire stations by matching up , given the requirement that only fire stations in the same Province can respond.

    Given both tables have non clustered indexes on Province would something similar to this filtering logic within the sub query help reduce row operations, and speed up the query, or would it just break it? This also assumes ever province has a FishAndChips place in it.

    Inner join ON SwimmingPools.Province = FishandChips.Province

    Assuming every province does not have a FishandChips location, a better assumption, would a LEFT JOIN take care if that? Or would we get a NULL returned for no matching province?

    Thank you for the post, and considering answering my question.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s