March 6, 2012

## Drawing Fractals with SQL Server Spatial

I can’t think of any practical purpose for this code, but here’s a recursive SQLCLR procedure to draw a Sierpinski triangle fractal (or, at least, an approximation of one to a given level of detail) in SQL Server just because you can.

```using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Types;
using System.Data.SqlTypes;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;
using System.Data;

namespace ProSpatial
{
public partial class StoredProcedures
{
[Microsoft.SqlServer.Server.SqlProcedure]
public static void SierpinskiTriangle(int size)
{

// Set properties of exterior equilateral triangle
int w = size;  // Width. e.g. 512
int h = (int)(w * Math.Sqrt(3) / 2);  // Height
int[] x = { 0, w, w/2 };  // x vertices
int[] y = { h, h, 0 };  // y vertices

// Create a new SqlGeometry Instance to hold the output
SqlGeometry Triangles = new SqlGeometry();
Triangles.STSrid = 0;

// Start recursion
Triangles = drawSierpinskiTriangle(x, y, w/2, 2, Triangles);

// Send the results back to SQL Server
SendResults(Triangles);
}

private static SqlGeometry drawSierpinskiTriangle(int[] x, int[] y, int d, int dMin, SqlGeometry Triangles)
{

// If triangles are too small to render then make this the last recursion
if (d <= dMin)
{
// Create a new triangle and add it to the collection
SqlGeometry Polygon = TriangleFromPoints(x[0], y[0], x[1], y[1],  x[2], y[2]);
Triangles = Triangles.STUnion(Polygon);
}
else
{
// Calculate centre of each side
int xMc = (x[0] + x[1]) / 2, yMc = (y[0] + y[1]) / 2;
int xMb = (x[0] + x[2]) / 2, yMb = (y[0] + y[2]) / 2;
int xMa = (x[1] + x[2]) / 2, yMa = (y[1] + y[2]) / 2;

// Subdivide into three new triangles
int[] xNew1 = { x[0], xMc, xMb };
int[] yNew1 = { y[0], yMc, yMb };
Triangles = drawSierpinskiTriangle(xNew1, yNew1, d / 2, dMin, Triangles);

int[] xNew2 = { x[1], xMc, xMa };
int[] yNew2 = { y[1], yMc, yMa };
Triangles = drawSierpinskiTriangle(xNew2, yNew2, d / 2, dMin, Triangles);

int[] xNew3 = { x[2], xMb, xMa };
int[] yNew3 = { y[2], yMb, yMa };
Triangles = drawSierpinskiTriangle(xNew3, yNew3, d / 2, dMin, Triangles);
}

// Recursion finished - return the result
return Triangles;
}

// Send the results back to the client
private static void SendResults(SqlGeometry Triangles)
{
// Define the metadata of the results column

// Create a record based on this metadata
record.SetValue(0, Triangles);

// Send the results back to the client
SqlContext.Pipe.Send(record);
}

// Construct a triangle from 3 vertices
private static SqlGeometry TriangleFromPoints(double x0, double y0, double x1, double y1, double x2, double y2)
{
SqlGeometryBuilder TriangleBuilder = new SqlGeometryBuilder();
TriangleBuilder.SetSrid(0);
TriangleBuilder.BeginGeometry(OpenGisGeometryType.Polygon);
TriangleBuilder.BeginFigure(x0, y0);
TriangleBuilder.EndFigure();
TriangleBuilder.EndGeometry();
return TriangleBuilder.ConstructedGeometry;
}

}
}
```

Import the assembly, register the procedure, and then execute in SQL Server. It requires a single parameter, size, which determines the overall scale of the result.

```/* Import Assembly */
CREATE ASSEMBLY SierpinskiTriangle
FROM 'c:\users\alastair\documents\SierpinskiTriangle.dll'
WITH PERMISSION_SET = SAFE;
GO

/* Register function */
CREATE PROCEDURE dbo.SierpinskiTriangle(@size int)
AS EXTERNAL NAME SierpinskiTriangle.[ProSpatial.StoredProcedures].SierpinskiTriangle;
GO

/* Execute Procedure */
EXEC dbo.SierpinskiTriangle 512;

```

When executed with a size of 512, the result is a single MultiPolygon instance containing a set of Sierpinski triangles as follows:

March 2, 2012

## Cleaning up artefacts created by Reduce()

The Reduce() method simplifies geography or geometry instances by removing vertices. Depending on the situation, this can sometimes cause a Polygon instance to collapse in on itself, degenerating into a LineString, or even a single Point.

For example, in the following code listing I’ve simplified a MultiPolygon instance of the United Kingdom using the Reduce() methods with a tolerance of 2km:

```SELECT
@UK.Reduce(2000) AS Shape;
```

This causes many of the Scottish islands and some other parts of coastline to degenerate into singular Point instances, which are shown as black specks in the SSMS spatial results tab, as follows:

Instead of being a MultiPolygon instance, the shape is now a GeometryCollection containing a variety of geometry types, but this is often not what you really want.

Assuming that what you really wanted was only those simplified Polygons returned by Reduce(), you can use the following T-SQL function to loop through a geometry and removes any Point or LineString elements, returning only the union of all polygonal elements (i.e. those elements where STDimension() = 2):

```CREATE FUNCTION RemoveArtefacts(@g geography) RETURNS geography AS BEGIN
DECLARE @h geography = geography::STGeomFromText('POINT EMPTY', @g.STSrid);
DECLARE @i int = 1;
WHILE @i <= @g.STNumGeometries() BEGIN
IF(@g.STGeometryN(@i).STDimension() = 2) BEGIN
SELECT @h = @h.STUnion(@g.STGeometryN(@i));
END
SET @i = @i + 1;
END
RETURN @h;
END;

```

Usage as follows:

```SELECT
dbo.RemoveArtefacts(@UK.Reduce(2000)) AS Shape;

```

The map is now simplified as before, but does not contain the unwanted degenerate artifacts created as a result of vertex removal:

February 22, 2012

## Updating a SQL Server Table with Nearest Neighbours (Optimised for SQL Server 2012 / SQL Azure)

So I’ve been trying to think of some examples that make use of the Garmin POI data from my last post. Seeing as POI data naturally lends itself to nearest-neighbour queries, I thought I’d write a little about those.

The general form of the nearest neighbor query is: “Show me the closest x restaurants/pubs/train stations to this location”; this is the query pattern used by location-aware applications to find POIs close to the user’s current location, for example. However, another pattern of which I’ve seen much fewer examples is how to update an entire table of records in order to determine and populate the nearest neighbour for each row in the table (indeed, this very question came up on the MSDN spatial forum just last week).

With this in mind, I thought I’d demonstrate a practical scenario: let’s say you’re planning going for a swim at the local pool and, after you’ve swum 50 lengths, you’ll probably feel a bit peckish – perhaps in the need for some fish ‘n’ chips. So, in this example I’m firstly going to import the Garmin POI data for swimming pools and fish ‘n’ chip shops in the UK, each being represented by a geography Point instance. I’ll then add an extra column to the swimming pool table and populate it with the closest fish and chip shop to every swimming pool.

To start with, create tables containing the data for fish and chip shops and swimming pools (download the data from here). Notice that I’ve included an IDENTITY primary key field in each table, which I’ll need in order to add a spatial index later:

```SELECT
Id = IDENTITY(int,1,1),
CAST(Name AS varchar(255)) AS Name,
CAST(Details AS varchar(255)) AS Details,
geography::Point(Latitude,Longitude,4326) AS Location
INTO SwimmingPools
FROM OPENROWSET(
FORMATFILE='C:\temp\garminpoi.xml',
FIRSTROW=2
) AS SwimmingPools;
GO

ALTER TABLE SwimmingPools ADD PRIMARY KEY(Id);
GO

SELECT
Id = IDENTITY(int,1,1),
CAST(Name AS varchar(255)) AS Name,
CAST(Details AS varchar(255)) AS Details,
geography::Point(Latitude,Longitude,4326) AS Location
INTO FishAndChips
FROM OPENROWSET(
FORMATFILE='C:\temp\garminpoi.xml',
FIRSTROW=2
) AS FishAndChips;
GO

ALTER TABLE FishAndChips ADD PRIMARY KEY(Id);
GO
```

Now we’ll add some extra columns to the Swimming Pools table, which we’ll populate with information about the closest fish ‘n’ chip shop:

```ALTER TABLE SwimmingPools
Distance decimal(18,2);
```

Now, you don’t need to add a spatial index to perform the update query, but it’ll be a lot faster with one than without. SQL Server can only make use of one spatial index in a join between tables, so there’s no need to add an index to both tables. We’ll just add one to the Fish And Chips table, as follows:

```CREATE SPATIAL INDEX sidx_FishAndChips ON FishAndChips(Location)
USING  GEOGRAPHY_GRID
WITH (
GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH),
CELLS_PER_OBJECT = 16
);

```

Then, run an UPDATE query to populate the ClosestFishBar and Distance columns, as follows:

```UPDATE s
SET ClosestFishbar = fName,
Distance = Location.STDistance(fLocation)
FROM (
SELECT
SwimmingPools.*,
fnc.Name AS fName,
fnc.Location AS fLocation
FROM SwimmingPools
CROSS APPLY (
SELECT TOP 1
Name,
Location
FROM FishAndChips WITH(index(sidx_FishAndChips))
WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL
ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC
) fnc
) s;
```

This query probably needs some explanation:

• Generally, you might expect to use a correlated subquery to obtain the closest fish and chip shop to each swimming pool. However, correlated subqueries can only return a single value, and I want to retrieve both the name and the distance to the closest fish bar, so I've used a CROSS APPLY instead.
• In the function being applied, I’m sorting the table of fish and chip shops by ascending order of their distance from the current swimming pool (ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC), and then selecting the TOP 1 Name and Location (i.e. the closest).
• To make the query efficient, I’ve also added an index hint to use the spatial index on the Fish and Chips table – WITH(index(sidx_FishAndChips)), and I’ve included an extra predicate, WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL. This extra predicate will help the query optimiser choose the dedicated nearest neighbour query plan introduced in SQL Server 2012 and SQL Azure.
• Note that the spatial index-optimised nearest neighbour query plan is only available in SQL Server 2012/SQL Azure. If you try to execute the UPDATE query as written above in SQL Server 2008/R2 then you’ll get an error:

The query processor could not produce a query plan for a query with a spatial index hint.  Reason: Spatial indexes do not support the comparator supplied in the predicate.  Try removing the index hints or removing SET FORCEPLAN.

As stated, to resolve this you’ll have to remove the WITH(index(sidx_FishAndChips)) index hint (and also expect your query to run a lot slower!). Instead, you might need to try one of the alternative approaches to efficiently finding nearest neighbours in SQL Server 2008/R2.

• Finally, in order to perform the UPDATE, I’ve wrapped the whole lot in a table alias, s, and then updated the two columns as required.

This query should take a few seconds to run (or, if you don’t create/use the spatial index, a few minutes), after which you can check the results as follows:

```SELECT * FROM SwimmingPools;

```

And there you have the complete table of swimming pools, each listed with the name and distance to its closest fish and chip shop:

Seeing as it’s less than 100 metres from getting out of the water to a bag of chips, I’ll think I’ll be taking my next dip at the Farnworth swimming pool….

February 20, 2012

## Load Garmin POI data to SQL Server

I’m always looking out for interesting new sources of spatial data to provide examples for use in my books and presentations. Microsoft’s own MSDN documentation demonstrates SQL Server spatial methods only using abstract geometries, such as

`SET @h = geometry::STGeomFromText('POINT(10 10)', 0);`

However, people often tell me that they really appreciate real-world examples instead – not only are they more interesting, but it’s easier to understand a practical use for the method, and they’re also more memorable afterwards. Some examples I’ve used in the past include:

• Use STBuffer() to calculate the free delivery area around a Pizza restaurant
• Use STDisjoint() to ensure that a planned road development does not intersect a protected area of natural beauty
• Use STIntersection() to identify the section of the Via Appia Roman road that passed through the malaria-infested Pontine marshes.

Anyway, I thought I’d share a new source of data I came across recently – it’s a free dataset of (UK) points of interest designed for importing into Garmin sat-nav systems. As such, it contains practical POIs, categorised into things like museums, mountains, petrol stations, and restaurants. However, since it’s distributed in a simple CSV format, it’s super easy to import the data into SQL Server or other systems as well. Here’s how:

When unzipped, you’ll find it contains a number of categorised CSV files (each with an accompanying BMP icon, which you could use if you want to display this data using pushpins on an SSRS report or Bing Maps, for example):

## 2. Create a Format File

All of the CSV files follow exactly the same structure – with columns for Longitude, Latitude, a Name, and Description. This can be described by the following bulk format file, which you should save as garminpoi.xml:

```<?xml version="1.0"?>
<RECORD>
<FIELD ID="0" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="0" NAME="Longitude" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="1" NAME="Latitude" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="Name" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="Details" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
```

## 3. Open the CSV file in SQL Server

Now, you can access the data from the CSV file directly in SQL Server using OPENROWSET, supplying the name of the CSV file you want to load following the BULK keyword, and the formatfile created above following the FORMATFILE keyword. The CSV files contain column headers, which you can skip by specifying FIRSTROW=2. What’s more, you can pass the Latitude and Longitude coordinate values from the CSV straight to the Point() method of the geography datatype to create point instance at each location on-the-fly.

For example, try executing the following code listing:

```SELECT *,
geography::Point(Latitude,Longitude,4326) AS TescoStore
FROM OPENROWSET(
FORMATFILE='C:\temp\garminpoi.xml',
FIRSTROW=2
) AS GarminPOI;
```

Result? An instant map of all Tesco branches in the UK!