March 6, 2012

## Drawing Fractals with SQL Server Spatial

I can’t think of any practical purpose for this code, but here’s a recursive SQLCLR procedure to draw a Sierpinski triangle fractal (or, at least, an approximation of one to a given level of detail) in SQL Server just because you can.

```using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Types;
using System.Data.SqlTypes;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;
using System.Data;

namespace ProSpatial
{
public partial class StoredProcedures
{
[Microsoft.SqlServer.Server.SqlProcedure]
public static void SierpinskiTriangle(int size)
{

// Set properties of exterior equilateral triangle
int w = size;  // Width. e.g. 512
int h = (int)(w * Math.Sqrt(3) / 2);  // Height
int[] x = { 0, w, w/2 };  // x vertices
int[] y = { h, h, 0 };  // y vertices

// Create a new SqlGeometry Instance to hold the output
SqlGeometry Triangles = new SqlGeometry();
Triangles.STSrid = 0;

// Start recursion
Triangles = drawSierpinskiTriangle(x, y, w/2, 2, Triangles);

// Send the results back to SQL Server
SendResults(Triangles);
}

private static SqlGeometry drawSierpinskiTriangle(int[] x, int[] y, int d, int dMin, SqlGeometry Triangles)
{

// If triangles are too small to render then make this the last recursion
if (d <= dMin)
{
// Create a new triangle and add it to the collection
SqlGeometry Polygon = TriangleFromPoints(x[0], y[0], x[1], y[1],  x[2], y[2]);
Triangles = Triangles.STUnion(Polygon);
}
else
{
// Calculate centre of each side
int xMc = (x[0] + x[1]) / 2, yMc = (y[0] + y[1]) / 2;
int xMb = (x[0] + x[2]) / 2, yMb = (y[0] + y[2]) / 2;
int xMa = (x[1] + x[2]) / 2, yMa = (y[1] + y[2]) / 2;

// Subdivide into three new triangles
int[] xNew1 = { x[0], xMc, xMb };
int[] yNew1 = { y[0], yMc, yMb };
Triangles = drawSierpinskiTriangle(xNew1, yNew1, d / 2, dMin, Triangles);

int[] xNew2 = { x[1], xMc, xMa };
int[] yNew2 = { y[1], yMc, yMa };
Triangles = drawSierpinskiTriangle(xNew2, yNew2, d / 2, dMin, Triangles);

int[] xNew3 = { x[2], xMb, xMa };
int[] yNew3 = { y[2], yMb, yMa };
Triangles = drawSierpinskiTriangle(xNew3, yNew3, d / 2, dMin, Triangles);
}

// Recursion finished - return the result
return Triangles;
}

// Send the results back to the client
private static void SendResults(SqlGeometry Triangles)
{
// Define the metadata of the results column

// Create a record based on this metadata
SqlDataRecord record = new SqlDataRecord(metadata);
record.SetValue(0, Triangles);

// Send the results back to the client
SqlContext.Pipe.Send(record);
}

// Construct a triangle from 3 vertices
private static SqlGeometry TriangleFromPoints(double x0, double y0, double x1, double y1, double x2, double y2)
{
SqlGeometryBuilder TriangleBuilder = new SqlGeometryBuilder();
TriangleBuilder.SetSrid(0);
TriangleBuilder.BeginGeometry(OpenGisGeometryType.Polygon);
TriangleBuilder.BeginFigure(x0, y0);
TriangleBuilder.EndFigure();
TriangleBuilder.EndGeometry();
return TriangleBuilder.ConstructedGeometry;
}

}
}
```

Import the assembly, register the procedure, and then execute in SQL Server. It requires a single parameter, size, which determines the overall scale of the result.

```/* Import Assembly */
CREATE ASSEMBLY SierpinskiTriangle
FROM 'c:\users\alastair\documents\SierpinskiTriangle.dll'
WITH PERMISSION_SET = SAFE;
GO

/* Register function */
CREATE PROCEDURE dbo.SierpinskiTriangle(@size int)
AS EXTERNAL NAME SierpinskiTriangle.[ProSpatial.StoredProcedures].SierpinskiTriangle;
GO

/* Execute Procedure */
EXEC dbo.SierpinskiTriangle 512;

```

When executed with a size of 512, the result is a single MultiPolygon instance containing a set of Sierpinski triangles as follows:

April 25, 2011

## Nearest-Neighbours, Voronoi Diagrams, and Finding your nearest SQL Server Usergroup

As somebody interested in all things spatial, I was delighted that the community corner at SQLBits8 featured an interactive mapping application. Many people commented on how much work had obviously gone into the user interface: the ability to add “pushpins” was particularly realistic, and the base dataset, despite being centred on the UK, was designed to be modular and extendable – with users adding in further layers for Italy, Germany, Holland, Sweden, Norway, Denmark, Austria and Australia.

If you’re wondering, the application front-end interface in question looked like this:  (red pins are usergroup locations, blue pins are attendees)

The purpose of the map was obviously both to visualise the geographic distribution of attendees coming to SQLBits, as well as to show people the location of their closest usergroup. Seeing as this was a SQL Server conference, it gave me an idea on how you could analyse this information with SQL Server spatial…

## Displaying Locations of UK SQL Server Usergroups

To start with, we can create a table with geography points representing regional SQL Server usergroups in the UK (information kindly provided by tsqltidy).

To show the locations where each usergroup is held, we can then simply select each point, buffer them by a reasonable amount (10,000m, in this case), and display them against a background map of Great Britain.

SELECT name, location.STBuffer(10000) FROM SQLServerUGs;

Here’s what the results of that query look like in the SSMS Spatial Results tab:

That’s the objective of visualising the locations of all UK usergroups completed…. now, what about the other purpose of the map – to let users located their closest usergroup. This is commonly known as a “nearest-neighbour” query. So how would we go about doing this?

## Nearest-Neighbour Queries

I don’t know the actual locations of all the attendees of SQLBits, and I can’t be bothered to trace their locations from the little blue pins in the first photo of this post, so let’s create some dummy data instead. The following code will create a table containing the locations of 800 fictional attendees, all situated somewhere on GB mainland (maybe there are some SQL Server DBAs out in the North Sea on oil rigs or pirate ships, but let’s just assume not for the sake of simplicity):

CREATE TABLE #SQLBitsAttendees (
id int identity(1,1),
location geography
);
SET NOCOUNT ON;
DECLARE @i int = 0;
WHILE @i < 800 BEGIN
DECLARE @p geography;
SET @p = geography::Point(RAND()*10+50, RAND()*8-6, 4326);
IF (@p.STIntersects(@GB)=1)
BEGIN
INSERT INTO #SQLBitsAttendees(location) VALUES (@p)
SET @i = @i + 1;
END
END

And here they all are  (including a fictional attendee from the Shetland Islands, from the looks of it):

Now, to find out the location of the closest usergroup for each attendee, you could use the STDistance() method to measure and sort the distance from each usergroup, then select the TOP 1 closest in a subselect, as follows:

SELECT
id,
location,
(SELECT TOP 1 name
FROM #SQLServerUGs ug
ORDER BY ug.location.STDistance(a.location)
) AS nearestUG
FROM #SQLBitsAttendees a

The problem with this approach is that, in order to determine the closest usergroup, the subselect statement has to evaluate the distance from every usergroup, only to select the top one. What’s more, in SQL Server 2008/R2 this query cannot take advantage of an index (spatial or otherwise), so it involves a full scan of the usergroups table for every attendee. Not good.

In SQL Server Denali, there’s a new nearest-neighbour query plan that does allow the query optimiser to use a spatial index for these types of queries, so long as they are expressed using a particular pattern. To use the nearest-neighbour plan in SQL Server Denali for this example, we have to add a IS NOT NULL condition to the STDistance() method, as follows:

SELECT
id,
location,
(SELECT TOP 1 name
FROM #SQLServerUGs ug
WHERE ug.location.STDistance(a.location) IS NOT NULL
ORDER BY ug.location.STDistance(a.location)
) AS nearestUG
FROM #SQLBitsAttendees a

This improves the performance significantly, but it’s still not an ideal query design.

One alternative approach could be to place a constraint on the subselect query, by assuming that nobody wants to travel more than 100km to attend a usergroup:

SELECT
id,
location.STAsText(),
(SELECT TOP 1 name
FROM #SQLServerUGs ug
WHERE ug.location.STDistance(a.location) < 100000
ORDER BY ug.location.STDistance(a.location)
) AS nearestUG
FROM #SQLBitsAttendees a

The advantage of this approach is that a spatial index can be used to fulfil the STDistance()query predicate (in both SQL Server 2008 and Denali), so this can efficiently reduce the number of rows that need to be evaluated and sorted. An added benefit is that it also allows us to identify all those users who are further than 100km from their closest usergroup, since the subselect will return NULL for those records:

(Perhaps users with a NULL nearestUG should consider starting their own local group?)

This approach is still not perfect though, since it relies on a fairly arbitrary limit of 100km for the limit within which to search. Setting this limit too high and we leave ourselves with a lot of data still to sort. But set it too low and some keen DBAs and Devs might be willing to travel further than the specified limit, and would miss out on knowing what their closest UG would be.

Another alternative solution, first proposed by Isaac Kunen, is to make use of a numbers table to create a query that looks for nearest neighbours in an expanding series of search ranges. The initial search area is set small, but then expands exponentially outwards until a neighbour is found. A query using this logic looks something like this:

DECLARE @start FLOAT = 1000;
WITH NearestUGs AS
(
SELECT TOP(1) WITH TIES *,  T.g.STDistance(@x) AS dist
FROM Numbers JOIN T WITH(INDEX(spatial_index))
ON T.g.STDistance(@x) < @start*POWER(2,Numbers.n)
ORDER BY n
)
SELECT TOP(1) * FROM NearestUGs
ORDER BY n, dist

(for explanation of what’s going on here, I suggest you refer to Isaac’s post).

Empirically, this query works pretty well – but it’s not exactly pretty to look at and it’s pretty daunting unless you’re proficient with fairly advanced T-SQL as well as with the nuisances of dealing with spatial data. So, there’s still scope for a better solution.

Enter Voronoi diagrams…

## Voronoi Diagrams

A Voronoi diagram is a formed from a set of underlying points (called Voronoi sites). Around each site is a Voronoi cell, which is the area made up of all those points that lie closest to that site than to any other site. The edges at which two Voronoi cells meet is therefore constructed from all those points that lie equidistant between the two sites.

The complete set of Voronoi cells form a tessellating, non-overlapping set of of Polygons that cover the full extent of a set of data, known as a Voronoi tessellation.

Voronoi diagrams are closely related to Delauney triangulations (which I also used in my post about alpha shapes). In fact, if you connect the circumcenters of all the triangles in a Delauney triangulation, you’ll create a Voronoi tessellation of the original set of points, as shown below:

Delauney triangulation (black lines) and Voronoi tessellation (red lines) of a set of points. Image from http://en.wikipedia.org/wiki/File:Delaunay_Voronoi.png

The simplest (although not necessarily the most efficient) way to construct the Voronoi tessellation of a set of points is to take advantage of this duality. First, create the Delauney triangulation of the points, and then connect the circumcenters of all those triangles that have a common vertex to create the Voronoi cell around the site at that vertex. You can find more details on the algorithms required to create Delauney Triangulations and Voronoi tessellations at http://www.cs.cmu.edu/~quake/triangle.html and http://paulbourke.net/papers/triangulate/, as well as plenty of other resources on the internet. You know where to look.

What’s all this got to do with nearest neighbour queries? Well, suppose we create a Voronoi tessellation based on the set of points representing the location of SQL Server UGs. The result will be a set of (convex) polygons, in which each polygon contains exactly one usergroup, and all those points closest to that usergroup than to any other. If we store each Voronoi cell as a geography Polygon and clip them to the outline map of Great Britain, we get something a bit like this:

You can think of the coloured polygons as representing the “catchment area” for each of the usergroups. In order to work out the closest usergroup for any attendee, all we therefore have to do is to work out which polygon their location lies in, which can be done with a simple STIntersects() query:

SELECT
a.id,
a.location,
ug.name AS nearestUG
FROM #SQLBitsAttendees a
JOIN SQLServerUGs ug ON a.location.STIntersects(ug.area) = 1

And here they all are, assigned to the correct UG, and much quicker than any of the preceding queries (as the STIntersects() predicate is fully capable of taking advantage of any spatial index):

The only thing to remember is that you will have to recreate the underlying Voronoi tessellation every time the distribution of usergroups changes (i.e. every time a usergroup is added/removed/or starts meeting at a new location). However, we can probably safely assume that new usergroups don’t pop up every day, and it doesn’t take that long to recreate the tessellation anyway. For nearest-neighbour queries where the “neighbours” are relatively static, Voronoi tessellations are therefore a very good choice.

You can download the code used in the examples above from here.

Tags: ,
March 7, 2011

## Modelling Road Networks in SQL Server 2008

One of the most common example uses stated for a LineString geometry (or geography) in SQL Server is to model a road. In fact, a LineString geometry is ideally suited to almost any transit route – it is a simple, one-dimensional geometry, which is directed (i.e. it has a defined start and end point, which is useful for representing one-way routes), and it can be used with a range of useful methods such as STCrosses(), STLength(), STIntersects().

However, whilst it’s well known that a (single) LineString geometry can represent a (single) road, it is pretty unlikely that you’ll want to ever consider one road in isolation. I haven’t seen many discussions about different approaches to storing a connected network of roads in SQL Server, which is what I hope to briefly introduce in this post.

All of the following examples show different ways of modelling a few roads in Norwich, as shown on the following tile from Bing Maps:

## Approach #1 Single Table – One LineString per Road.

If your application needs only to consider each road as a separate entity (say, for plotting roads on a map, or for identifying those roads that lie within a certain distance of a point), then network topology can be ignored. Each road can be stored as a separate LineString in a straightforward “Roads” table, as follows:

```DECLARE @Roads TABLE (
);

INSERT INTO @Roads VALUES
(1, 'Britannia Road', 'LINESTRING(1.313772 52.636871,1.316401 52.633518,1.316497 52.632869,1.316642 52.632542)'),
(2, 'Belsize Road', 'LINESTRING(1.317538 52.632697,1.317307 52.633448,1.317098 52.633749)'),
(3, 'Vincent Road', 'LINESTRING(1.31734 52.633818, 1.315982 52.635498,1.315038 52.635229)'),
(4, 'Plumstead Road', 'LINESTRING(1.314546 52.633479,1.31529 52.633298,1.315902 52.633363,1.318332 52.634119)');```

And each row in the table accurately represents the shape of one individual road, shown as follows on the SSMS Spatial Results tab:

`SELECT * FROM @Roads;`

The problem is that, using this structure, there is no explicit relationship defined between any of the roads – each is treated as a distinct, separate entity, with no connection to each other. In other words, there is no network topology. The only way of defining a logical relationship between any of these roads is based on examining the spatial relationships between the LineStrings representing those roads.

We can see visually that Britannia Road crosses Plumstead Road, and that Vincent Road and Belsize Road are turnings off Plumstead Road. We can also use the inbuilt geometry methods STIntersects(), STTouches(), STCrosses() etc. to test these relationships programmatically. For example, the following code identifies all those roads that intersect Plumstead Road:

```DECLARE @g geography;

WHERE RoadGeometry.STIntersects(@g) = 1

(Note that, by definition, every geometry intersects itself. Therefore we need to add the additional condition to the SELECT statement to avoid including Plumstead Road in the results).

However, there’s a problem with this approach. The results only give two rows: Britannia Road and Belsize Road. Why isn’t Vincent Road being returned? The answer may not be apparent from the Spatial Results tab in SQL Server Management Studio as shown above, but the LineStrings representing Vincent Road and Plumstead Road don’t quite touch. In fact, if you run the following query:

```DECLARE @g geometry;

DECLARE @h geometry;

SELECT @g.STDistance(@h);```

You’ll see that the result is 7.27813053755075E-06 – in other words there is a gap of 7 micrometres between them. While this is unlikely to matter if all you want to do is display the roads (unless you’re using an insane map zoom level!), it clearly makes a great deal of difference if you’re using intersection to infer connection between these roads. Also note that although there are two turnings from Plumstead Road onto Britannia Road (depending on whether you turn left or right onto Britannia Road), this only counts as one intersection, so it is only returned once in the results.

To model the topological structure of a road network for routing or analysis purposes, we clearly need an alternative model.

## Approach #2 Single Table – Multiple Segmented LineString Edges

The next approach might be, rather than describe each road as a single LineString, to split each road up into a number of connected LineString segments. The end point of each LineString segment implicitly defines an intersection at which that road connects with another road, or to the next segment of the current road. This should avoid the problems with the previous model, since any two roads that intersect will both share exactly the same start/end point – there should be no ambiguity caused by one road running close to (but not quite intersecting) another, and there can be an infinite number of LineStrings that share a common start/end point. In fact, this model provides far more flexibility than the previous model, because it is also possible to define roads that intersect but are not connected (that is, they intersect somewhere mid-LineString and not at the start/end point of a segment).

To keep track of which segment(s) are part of which road, we can relate each SegmentId to it’s corresponding RoadId and, as each Segment belongs to one and only one Road, we can add the RoadId directly to the Segments table without affecting normalisation.

The following code demonstrates this approach to define the same example roads as used previously:

```DECLARE @Roads TABLE (
);

INSERT INTO @Roads VALUES

DECLARE @RoadSegments TABLE (
SegmentId int,
SegmentGeometry geography
);

INSERT INTO @RoadSegments VALUES
(1, 1, 'LINESTRING(1.313772 52.636871, 1.315038 52.635229)'),
(2, 1, 'LINESTRING(1.315038 52.635229,1.316052 52.63399,1.316401 52.633518)'),
(3, 1, 'LINESTRING(1.316401 52.633518,1.316497 52.632869,1.316642 52.632542)'),
(4, 2, 'LINESTRING(1.317538 52.632697,1.317307 52.633448,1.317098 52.633749)'),
(5, 3, 'LINESTRING(1.31734 52.633818,1.315982 52.635498,1.315038 52.635229)'),
(6, 4, 'LINESTRING(1.314546 52.633479,1.31529 52.633298,1.315902 52.633363,1.316401 52.633518)'),
(7, 4, 'LINESTRING(1.316401 52.633518,1.317097 52.633749)'),
(8, 4, 'LINESTRING(1.317098 52.633749,1.31734 52.633818)'),
(9, 4, 'LINESTRING(1.31734 52.633818,1.318332 52.634119)');```

To view the complete road network, we can now run the following query:

```SELECT *

(Note that, in order to buffer the roads a bit and make both the roads names and segment IDs visible, the actual query used to produce the above image was:

```SELECT RoadName, SegmentGeometry
UNION ALL
SELECT '    ' + CAST(SegmentID AS varchar(32)) AS RoadName, SegmentGeometry.STBuffer(2) AS SegmentGeometry

We can now find all those roads that are accessible from Plumstead Road using the following query:

```DECLARE @RoadID int;

JOIN (SELECT SegmentGeometry.STEndPoint() AS Node
ON rs.SegmentGeometry.STEndPoint().STEquals(RoadIntersections.Node) = 1

The subselect statement in this query first selects the endpoints of each LineString segment that are part of ‘Plumstead Road’. Then, the outer SELECT statement finds all those segments of other roads that start at any of these endpoints.

This approach assumes that all the LineStrings are directed – it only finds those that roads that start at the specified endpoint(s). To find all LineStrings that either start or end at any of the supplied nodes, we can modify the query to become:

```DECLARE @RoadID int;

JOIN (SELECT SegmentGeometry.STStartPoint() AS Node
ON rs.SegmentGeometry.STEndPoint().STEquals(RoadIntersections.Node) = 1
OR rs.SegmentGeometry.STStartPoint().STEquals(RoadIntersections.Node) = 1

This query now assumes all roads are two-way. Note that I’ve included the SegmentGeometry column in the results to demonstrate that the results now include the two different segments of Britannia Road that intersect Plumstead Road:

Britannia Road    LINESTRING (1.315038 52.635229, 1.316052 52.63399, 1.316401 52.633518)
Britannia Road    LINESTRING (1.316401 52.633518, 1.316497 52.632869, 1.316642 52.632542)
Belsize Road    LINESTRING (1.317538 52.632697, 1.317307 52.633448, 1.317098 52.633749)
Vincent Road    LINESTRING (1.31734 52.633818, 1.315982 52.635498, 1.315038 52.635229)

## Approach #3 Two Tables – Segmented LineString Edges and Intersection Point Nodes

There are still problems with the previous model. From a theoretical point-of-view, it is not necessarily a good idea to tie the spatial relationship between two features to their logical relationship so closely. Just because two roads “touch” doesn’t necessarily mean that one is accessible from the other. Also, it cannot model more complex connections and restrictions between objects that are geographically superimposed but not logically connected (think, for example, of roads that are “no right turn” when approached from a certain direction, but accessible when approached from the other).

Another fundamental problem with the previous approach comes from its practical implementation – since it relies on using the STEquals() method to compare and join the endpoints of road segments together, it is likely to be slow performing when compared to, say a direct join between integer ID fields representing those endpoints in two tables.

Suppose instead, that we added a new table, RoadIntersections, to explicitly define all those points at which two or more roads connected. The RoadIntersections table will contain one row for every road segment that joins a particular intersection. A T-Junction will therefore have three rows inserted into the RoadIntersections table, representing the three road segments that meet at the intersection. The example road network can be modelled using this approach as follows:

```DECLARE @Roads TABLE (
);

INSERT INTO @Roads VALUES

DECLARE @RoadSegments TABLE (
SegmentId int,
SegmentGeometry geography
);

INSERT INTO @RoadSegments VALUES
(1, 1, 'LINESTRING(1.313772 52.636871, 1.315038 52.635229)'),
(2, 1, 'LINESTRING(1.315038 52.635229,1.316052 52.63399,1.316401 52.633518)'),
(3, 1, 'LINESTRING(1.316401 52.633518,1.316497 52.632869,1.316642 52.632542)'),
(4, 2, 'LINESTRING(1.317538 52.632697,1.317307 52.633448,1.317098 52.633749)'),
(5, 3, 'LINESTRING(1.31734 52.633818,1.315982 52.635498,1.315038 52.635229)'),
(6, 4, 'LINESTRING(1.314546 52.633479,1.31529 52.633298,1.315902 52.633363,1.316401 52.633518)'),
(7, 4, 'LINESTRING(1.316401 52.633518,1.317097 52.633749)'),
(8, 4, 'LINESTRING(1.317098 52.633749,1.31734 52.633818)'),
(9, 4, 'LINESTRING(1.31734 52.633818,1.318332 52.634119)');

DECLARE @RoadIntersections TABLE (
IntersectionId varchar(32),
IntersectionLocation geography
);

INSERT INTO @RoadIntersections VALUES
('A', 'POINT(1.315038 52.635229)'),
('B', 'POINT(1.316401 52.633518)'),
('C', 'POINT(1.317097 52.633749)'),
('D', 'POINT(1.31734 52.633818)');

DECLARE @RoadIntersection_Segments TABLE (
IntersectionId varchar(32),
SegmentId int
);

INSERT INTO @RoadIntersection_Segments VALUES
('A',1),
('A',2),
('A',5),
('B',2),
('B',6),
('B',3),
('B',7),
('C',7),
('C',4),
('C',8),
('D',5),
('D',8),
('D',9);```

To view all the road segments and intersections in the model, we can now run the following query:

```SELECT IntersectionId, IntersectionLocation.STBuffer(7.5) FROM @RoadIntersections
UNION ALL
SELECT CAST(SegmentId AS varchar(32)), segmentgeometry.STBuffer(2)

The database structure may seem a lot more complicated than the original single-table approach, but it’s actually more efficient, and a lot more useful. This approach models the road network as a graph (as in graph theory, not as in Excel…), in which each road intersection is a node, and each road segment segment is a distinct edge, connecting exactly two nodes. The spatial relationship between two LineStrings has been separated from their logical relationship – even if two LineStrings touch, the roads represented by those LineStrings are only defined to be topologically connected if they share a common intersection as defined in the RoadIntersections table.

Why does this matter? Well, if your road network conforms to a graph model as above, not only can you perform spatial functions such as STDistance() or STLength() on the LineString geometries that represent individual roads, but you can also apply graph algorithms such as Djikstra or A* to perform routefinding across an entire table of spatial data, traversing across the edges (road segments) of the graph from a chosen start node to another end node, according to the defined network structure. And you can do this right from within SQL Server using a SQLCLR function – but that’s for another post….

As a side-note, you may think that this subject is only relevant if you’re collecting your own spatial data – surely if you’ve imported your road geometry data from an existing source, it will come supplied with the associated links between roads, right? However, this is generally not the case – the most commonly used format for spatial data interchange is the ESRI Shapefile, which represents a single layer of data, containing geometric “features” with certain “attributes”. However, it it does not explicitly define structured relations between those features – these must be created yourself, inferred from the associated attribute data.

January 21, 2011

## Splitting Multi Geometries into single geometries

Unlike some spatial formats/systems (such as the ESRI shapefile), SQL Server lets you store different types of geometry within a single column. That is to say that a single geometry or geography column within a SQL Server table may contain Points, LineStrings, Polygons, or Multi-elements of those types. Since every individual item of geometry data defines its own SRID, it is even possible to define items of different SRIDs within the same column.

Whilst in theory I approve of this flexibility to define the structure of your spatial data, in practice I rarely see situations in which it is suitable to store different types of geometries within the same column, i.e. Points and LineStrings. And I have never heard a good argument why you would want to store items with different SRIDs in the same column. In fact, I normally place a constraint on my columns to ensure that I only insert geometries of the same SRID – anything else is just asking for trouble.

However, there are many situations where it makes sense to mix both single items and collections of the same type of geometry. i.e. a table of rivers or roads would frequently need to store both LineStrings and MultiLineStrings. A table of countries would probably contain both Polygons (i.e. single land masses – Poland, Algeria, Mongolia etc.) and MultiPolygons (any country with separate islands – UK, Japan, New Zealand etc.). In other words, each item of data would have the same number of dimensions – 0, 1, or 2.

Consider the following table, which contains four rows of data – two MultiPolygons (a set of squares and a set of triangles), and two Polygons (both rectangles).

```
CREATE TABLE #MixedData (id int, geom geometry);
INSERT INTO #MixedData VALUES
(1, 'MULTIPOLYGON(((0 0, 2 0, 2 2, 0 2, 0 0)), ((6 6, 10 6, 10 10, 6 10, 6 6)), ((18 18, 48 18, 48 48, 18 48, 18 18)), ((60 60, 100 60, 100 100, 60 100, 60 60)))'),
(2, 'POLYGON((30 6, 100 6, 100 10, 30 10, 30 6))'),
(3, 'POLYGON((36 60, 42 60, 42 90, 36 90, 36 60))'),
(4, 'MULTIPOLYGON(((5 30, 10 35, 5 40, 5 30)), ((18 60, 18 80, 10 70, 18 60)), ((80 20, 90 20, 85 25, 80 20)))');

SELECT id, geom FROM #MixedData

```

The table looks like this:

And as shown on the spatial results tab below – each element from the same multi-geometry is shown in the same colour:

Now suppose you wanted to “break apart” the MultiPolygons in this table into single elements, so that we just had a table of 9 separate polygons. What would you do?

## Method 1: Using a User-Defined Function

You can create a user-defined function that splits a provided multigeometry into its component elements as follows:

```CREATE FUNCTION dbo.SplitMultiGeometry (@MultiGeom geometry)
RETURNS @Table TABLE( Geom geometry )
BEGIN
DECLARE @n int = 1;
WHILE (@n <= @MultiGeom.STNumGeometries())
BEGIN
INSERT INTO @Table VALUES(@MultiGeom.STGeometryN(@n));
SET @n = @n + 1;
END
RETURN
END
GO```

This function operates on an individual supplied geometry and returns a table containing each element it contains (for single element Points, Linestrings, and Polygons, it simply returns the original supplied value). You can use it as follows:

```DECLARE @g geometry = 'MULTIPOINT(0 0, 2 4, 10 7)';

SELECT geom.STAsText() FROM SplitMultiGeometry(@g);```

Which returns a table with three rows as follows:

POINT (0 0)
POINT (2 4)
POINT (10 7)

To use this function to split an entire column of values, you can then use the CROSS APPLY operator, as follows:

```SELECT
id,
split.geom
FROM
#MixedData
CROSS APPLY (
SELECT geom FROM SplitMultiGeometry(geom)
) split```

Note that, even though I’ve split each individual geometry out,  I’ve retained the id column from the original table so that you can still attribute which “parent” element each child was derived from, as shown below:

And, on the spatial results tab:

If you’re using the spatial types in a .NET application, you can also easily adapt the logic used in this function to operate on (and return) a List<> of SqlGeometry or SqlGeography values instead of a table.

```public List<SqlGeometry> SplitMultiGeometry(SqlGeometry MultiGeom) {
List<SqlGeometry> splitGeometries = new List<SqlGeometry>();
int numGeometries = (int)MultiGeom.STNumGeometries();

for (int n = 1; n <= numGeometries; n++)
{
}
return splitGeometries;
}```

## Method 2: Using a Numbers Table

Shortly after I first published this article, Simon Sabin wrote to me suggesting that an alternative approach would be to use a numbers table and, of course, he’s absolutely correct

To do so, first create a table containing the numbers from 1 to 100,000 (which should be plenty to accommodate the number of individual elements in a multielement geometry), as follows:

```SELECT TOP 100000 IDENTITY(int,1,1) AS n
INTO Numbers
FROM MASTER..spt_values a, MASTER..spt_values b;```

You can then join from your spatial table to the numbers table to retrieve each individual geometry using a query as follows:

```SELECT
id,
geom.STGeometryN(Numbers.n)
FROM
#MixedData JOIN Numbers
ON Numbers.n <= geom.STNumGeometries()```

The results obtained will be identical to those from the first method, although since it’s a purely set-based solution, you may well get better performance over large datasets than using the CROSS APPLY method. Thankyou to Simon for the suggestion.