Archive for February, 2012

February 22, 2012

Updating a SQL Server Table with Nearest Neighbours (Optimised for SQL Server 2012 / SQL Azure)

So I’ve been trying to think of some examples that make use of the Garmin POI data from my last post. Seeing as POI data naturally lends itself to nearest-neighbour queries, I thought I’d write a little about those.

The general form of the nearest neighbor query is: “Show me the closest x restaurants/pubs/train stations to this location”; this is the query pattern used by location-aware applications to find POIs close to the user’s current location, for example. However, another pattern of which I’ve seen much fewer examples is how to update an entire table of records in order to determine and populate the nearest neighbour for each row in the table (indeed, this very question came up on the MSDN spatial forum just last week).

With this in mind, I thought I’d demonstrate a practical scenario: let’s say you’re planning going for a swim at the local pool and, after you’ve swum 50 lengths, you’ll probably feel a bit peckish – perhaps in the need for some fish ‘n’ chips. So, in this example I’m firstly going to import the Garmin POI data for swimming pools and fish ‘n’ chip shops in the UK, each being represented by a geography Point instance. I’ll then add an extra column to the swimming pool table and populate it with the closest fish and chip shop to every swimming pool.

To start with, create tables containing the data for fish and chip shops and swimming pools (download the data from here). Notice that I’ve included an IDENTITY primary key field in each table, which I’ll need in order to add a spatial index later:

SELECT
  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO SwimmingPools
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Swimming Pools.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS SwimmingPools;
GO

ALTER TABLE SwimmingPools ADD PRIMARY KEY(Id);
GO

SELECT
  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO FishAndChips
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Fish and Chips.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS FishAndChips;
GO

ALTER TABLE FishAndChips ADD PRIMARY KEY(Id);
GO

Now we’ll add some extra columns to the Swimming Pools table, which we’ll populate with information about the closest fish ‘n’ chip shop:

ALTER TABLE SwimmingPools
ADD ClosestFishbar varchar(255),
    Distance decimal(18,2);

Now, you don’t need to add a spatial index to perform the update query, but it’ll be a lot faster with one than without. SQL Server can only make use of one spatial index in a join between tables, so there’s no need to add an index to both tables. We’ll just add one to the Fish And Chips table, as follows:

CREATE SPATIAL INDEX sidx_FishAndChips ON FishAndChips(Location)
USING  GEOGRAPHY_GRID 
WITH (
  GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH), 
  CELLS_PER_OBJECT = 16
);

Then, run an UPDATE query to populate the ClosestFishBar and Distance columns, as follows:

UPDATE s
SET ClosestFishbar = fName,
    Distance = Location.STDistance(fLocation)  
FROM (
  SELECT
    SwimmingPools.*,
    fnc.Name AS fName,
    fnc.Location AS fLocation
  FROM SwimmingPools 
  CROSS APPLY (
    SELECT TOP 1
     Name,
     Location
    FROM FishAndChips WITH(index(sidx_FishAndChips))
    WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL
    ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC
  ) fnc
) s;

This query probably needs some explanation:

  • Generally, you might expect to use a correlated subquery to obtain the closest fish and chip shop to each swimming pool. However, correlated subqueries can only return a single value, and I want to retrieve both the name and the distance to the closest fish bar, so I've used a CROSS APPLY instead.
  • In the function being applied, I’m sorting the table of fish and chip shops by ascending order of their distance from the current swimming pool (ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC), and then selecting the TOP 1 Name and Location (i.e. the closest).
  • To make the query efficient, I’ve also added an index hint to use the spatial index on the Fish and Chips table – WITH(index(sidx_FishAndChips)), and I’ve included an extra predicate, WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL. This extra predicate will help the query optimiser choose the dedicated nearest neighbour query plan introduced in SQL Server 2012 and SQL Azure.
  • Note that the spatial index-optimised nearest neighbour query plan is only available in SQL Server 2012/SQL Azure. If you try to execute the UPDATE query as written above in SQL Server 2008/R2 then you’ll get an error:

    The query processor could not produce a query plan for a query with a spatial index hint.  Reason: Spatial indexes do not support the comparator supplied in the predicate.  Try removing the index hints or removing SET FORCEPLAN.

    As stated, to resolve this you’ll have to remove the WITH(index(sidx_FishAndChips)) index hint (and also expect your query to run a lot slower!). Instead, you might need to try one of the alternative approaches to efficiently finding nearest neighbours in SQL Server 2008/R2.

  • Finally, in order to perform the UPDATE, I’ve wrapped the whole lot in a table alias, s, and then updated the two columns as required.

This query should take a few seconds to run (or, if you don’t create/use the spatial index, a few minutes), after which you can check the results as follows:

SELECT * FROM SwimmingPools;

And there you have the complete table of swimming pools, each listed with the name and distance to its closest fish and chip shop:

image 

Seeing as it’s less than 100 metres from getting out of the water to a bag of chips, I’ll think I’ll be taking my next dip at the Farnworth swimming pool….

February 20, 2012

Load Garmin POI data to SQL Server

I’m always looking out for interesting new sources of spatial data to provide examples for use in my books and presentations. Microsoft’s own MSDN documentation demonstrates SQL Server spatial methods only using abstract geometries, such as

SET @h = geometry::STGeomFromText('POINT(10 10)', 0);

However, people often tell me that they really appreciate real-world examples instead – not only are they more interesting, but it’s easier to understand a practical use for the method, and they’re also more memorable afterwards. Some examples I’ve used in the past include:

  • Use STBuffer() to calculate the free delivery area around a Pizza restaurant
  • Use STDisjoint() to ensure that a planned road development does not intersect a protected area of natural beauty
  • Use STIntersection() to identify the section of the Via Appia Roman road that passed through the malaria-infested Pontine marshes.

Anyway, I thought I’d share a new source of data I came across recently – it’s a free dataset of (UK) points of interest designed for importing into Garmin sat-nav systems. As such, it contains practical POIs, categorised into things like museums, mountains, petrol stations, and restaurants. However, since it’s distributed in a simple CSV format, it’s super easy to import the data into SQL Server or other systems as well. Here’s how:

1. Download

Download the dataset of POIs from the garminpoi.co.uk website at http://garminpoi.co.uk/?page=download

When unzipped, you’ll find it contains a number of categorised CSV files (each with an accompanying BMP icon, which you could use if you want to display this data using pushpins on an SSRS report or Bing Maps, for example):

image

2. Create a Format File

All of the CSV files follow exactly the same structure – with columns for Longitude, Latitude, a Name, and Description. This can be described by the following bulk format file, which you should save as garminpoi.xml:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
 <FIELD ID="0" xsi:type="CharTerm" TERMINATOR=","/>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=","/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
 <COLUMN SOURCE="0" NAME="Longitude" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="1" NAME="Latitude" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="Name" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="3" NAME="Details" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>

3. Open the CSV file in SQL Server

Now, you can access the data from the CSV file directly in SQL Server using OPENROWSET, supplying the name of the CSV file you want to load following the BULK keyword, and the formatfile created above following the FORMATFILE keyword. The CSV files contain column headers, which you can skip by specifying FIRSTROW=2. What’s more, you can pass the Latitude and Longitude coordinate values from the CSV straight to the Point() method of the geography datatype to create point instance at each location on-the-fly.

For example, try executing the following code listing:

SELECT *,
geography::Point(Latitude,Longitude,4326) AS TescoStore
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Tesco.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS GarminPOI;

Result? An instant map of all Tesco branches in the UK!

image

February 6, 2012

Rendering SQL Server spatial data with Mapnik (Part 2 of n, where n is large)

Last year, I wrote what was intended to be the first of a series of posts about Mapnik – a software toolkit for generating beautiful maps from various sets of spatial data. My intention had been to describe how to render spatial data from SQL Server as styled raster tile layers on Bing Maps, but the series got cut short rather prematurely when I discovered that it didn’t appear to be possible to connect Mapnik to SQL Server (even though it was an OGR-supported format).

Unfortunately, as is often the case with OS software, it seems that almost all Mapnik developers use PostGIS as a backend spatial database, and I failed to find anybody with any experience of SQL Server on either the Mapnik developers mailing list or on StackExchange (rather amusingly, the top-rated answer given to me on StackExchange gives, as its reference source, my own blog post where I explicitly state that the suggested approach doesn’t work!).

image

Since, at the time, I needed a tool to render SQL geometry data there and then, I left Mapnik alone and developed a solution using MapServer instead.

Now, MapServer is a great product: It’s a venerable and highly-respected product in the world of open source spatial software, developed by the University of Minnesota on behalf of NASA, and first released over 16 years ago (to put that in context: MapServer was released at around the same time as HTML 3.2 – long before CSS, AJAX, or Google Maps were heard of). And it’s still going strong today. However, I find that the archaic MapServer architecture and idiosyncracies of its syntax give away its age slightly.

Mapnik, by contrast, is freshly-designed, has a modern XML-style syntax, supports groovy things like node.js and Ruby, and generally feels like it’s not carrying so much baggage. If only it supported SQL Server….

6 months later, and the Release Candidate for Mapnik 2.0 is  now available to download, which claims to support SQL Server, so I thought I’d give it another try. Having just set it up and rendered some quick OS OpenData, I can confirm that it works. Hurray!

image

SSMS Spatial Results Tab Rendering of SQL Server data

image

Mapnik Rendering of the same SQL Server data

Since it seems that I’m one of only few people trying to use Mapnik and SQL Server together (and one of even less writing about it), I’ve started to document some tips in a page here – I’ll add to this page as I go along.

(On a separate note, I now have a whole different problem, because a recent change in the Bing Maps Terms of Use prevents you from adding your own custom road tile layers onto the Bing Map control. So, any tiles of OpenStreetMap or Ordnance Survey road data you create with Mapnik can’t be used with Bing Maps. Fortunately, there’s a workaround for that, but that’s for another post….)

Follow

Get every new post delivered to your Inbox.

Join 53 other followers