Archive for January, 2012

January 27, 2012

Ring Orientation, Bigger-than-a-Hemisphere Polygons, and the ReorientObject method in SQL Server 2012

If you’ve ever tried to import spatial data (from an ESRI shapefile, say) into SQL Server, you’ve almost certainly encountered the dreaded “bigger than a hemisphere” error, as follows:

Msg 6522, Level 16, State 1, Line 1 

A .NET Framework error occurred during execution of
user-defined routine or aggregate "geography":

Microsoft.SqlServer.Types.GLArgumentException: 24205:
The specified input does not represent a valid geography instance
because it exceeds a single hemisphere. Each geography instance
must fit inside a single hemisphere. A common reason for this
error is that a polygon has the wrong ring orientation.

This error arises out of a widely documented limitation in SQL Server 2008/R2 that geography instances cannot span more than a single hemisphere (i.e. they cannot extend over more than half the surface of the earth).

While, strictly speaking, this is a technical limitation, the “smaller-than-a-hemisphere” requirement actually has a beneficial side-effect of providing a useful sanity check of your data: in practice, it is very unusual to ever need a geography instance that spans more than a single hemisphere. Unless you are modelling the Pacific Ocean, or the entire continent of Asia as a single instance, the chances are that you didn’t really want a geography bigger than a hemisphere anyway. Instead, as the error message above states, what probably happened is that the coordinate data of your imported shape had the incorrect ring orientation – the coordinate points were listed in reverse order, which has the effect of inverting the polygon. Therefore, you ended up including everything that was meant to be excluded, and excluding the area that you meant to include. Instead of defining the county of Norfolk, for example, your polygon actually represents the entire surface of the earth except for Norfolk…

Assuming that what you really wanted was to define a relatively small polygon area – a country, state, or sales region, say, then the “within a hemisphere rule” ensures the quality of your data by rejecting any Polygon instances with the incorrect ring orientation.

Along Comes SQL Server 2012

Well, SQL Server 2012 now allows geography Polygons to be bigger than a hemisphere, and that brings a whole new interesting problem… because now, you no longer get an error when you try to import data with incorrect ring orientation – it loads just fine. And you might not even realise that there’s anything wrong with your data until you try to perform some calculations on it.

For example, I just downloaded some Australian river basins data from the Geoscience Australia website, and imported it into SQL Server using OGR2OGR. Everything seemed to go successfully, and here’s a glance at what the table looks like with the geography data inserted into the geog4202 column:

27-01-2012 12-05-32

Seems ok, right? But if you calculate the area of any of the river basins in this table using STArea(), you’ll get a result like this:

SELECT TOP 1 geog4202.STArea();

-- 510,069,272,680,889 (m2)

I mean, I know Australia is a big place, but… a river basin with an area of 510,069,272 km2Really? That seems awfully close to the entire surface area of the earth. And, if you look on the spatial results tab, you’ll see this:

27-01-2012 12-07-18

Sure enough – it looks pretty much like those polygons do cover the entire surface area of the earth.

The problem here is that the source data does not follow the same “left-hand rule” convention as SQL Server, and some (the majority, in fact) of the imported river basin polygons have been defined inside-out. But, because SQL Server now allows large geographies, no error or warning has been created.

The Solution

Fortunately, SQL Server 2012, whilst inadvertently causing this problem, also provides the solution, in the form of the ReorientObject() method, which reverses the ring orientation of any polygon to flip its interior and exterior.

We only want to use ReorientObject() to flip those polygons that have been defined inside-out. Unfortunately, there’s no fool-proof way to check for this, but it’s reasonable to assume that, in a dataset of Australia’s river basins, any polygon that covers more than half the earth’s surface must be incorrect. So, we can identify those polygons that need flipping because their EnvelopeAngle will be equal to or more than 90°, and put this in a CASE statement as follows:

SELECT
  CASE
    WHEN geog4202.EnvelopeAngle() >= 90 THEN geog4202.ReorientObject()
    ELSE geog4202
  END AS geog4202
FROM rbasin_polygon;

And now, the results look much more like those expected:

27-01-2012 12-13-22

(If you really want, you could even put a CHECK constraint on a geography column that prevented any polygons greater than a hemisphere being inserted into the table in the first place. Effectively, you’d be manually reintroducing a technical limitation from previous versions of SQL Server – ironic, huh?)

January 23, 2012

SQL Spatial Puzzle #3: The Seven Bridges of Königsberg

The old city of Königsberg, capital of East Prussia (now Kaliningrad), was built on either side of the river Pregel, with seven bridges across the river between four separate landmasses. A problem made famous by Swiss mathematician Leonard Euler is to try to find a route around the city that crosses every bridge once and only once.

image

To investigate this puzzle using SQL Server, we’ll model a portion of the city  using two tables – one containing Polygons representing the landmasses separated by rivers, and another containing LineStrings representing the bridges across those rivers:

CREATE TABLE #islands 
(
islandid char(1),
islandgeom geometry
);

INSERT INTO #islands VALUES
('A', 'POLYGON((20.505 54.706, 20.506 54.708, 20.5077 54.7086, 20.513 54.7074, 20.514 54.7075, 20.52 54.7077, 20.52 54.71, 20.503 54.71, 20.503 54.706, 20.505 54.706))'),
('B', 'POLYGON((20.506 54.706, 20.5073 54.7083, 20.5135 54.7068, 20.5134 54.7053, 20.506 54.706))'),
('C', 'POLYGON((20.52 54.698, 20.5149 54.6999, 20.5137 54.7009, 20.51435 54.704, 20.5135 54.7046, 20.5051 54.7054, 20.503 54.7056, 20.503 54.698, 20.52 54.698))'),
('D', 'POLYGON((20.52 54.707, 20.5162 54.707, 20.514 54.707, 20.5142 54.7051, 20.515 54.704, 20.515 54.703, 20.5149 54.7008, 20.5159 54.7, 20.52 54.699, 20.52 54.707))');

CREATE TABLE #bridges(
bridgeid char(1),
bridgegeom geometry
);
INSERT INTO #bridges VALUES
(1, 'LINESTRING(20.5082 54.7092, 20.5075 54.7076)'),
(2, 'LINESTRING(20.5109 54.7087, 20.5101 54.707)'),
(3, 'LINESTRING(20.5148 54.708, 20.5147 54.70634)'),
(4, 'LINESTRING(20.5128 54.706, 20.5147 54.7057)'),
(5, 'LINESTRING(20.5069 54.70645, 20.5061 54.7048)'),
(6, 'LINESTRING(20.5103 54.706, 20.5099 54.70416)'),
(7, 'LINESTRING(20.5174 54.6985, 20.5187 54.7003)');

Each of the islands is lettered from A – D, and each bridge is numbered from 1 – 7, as shown in the results of the following query (I’ve applied a small buffer to the bridges only to make them easier to see in the spatial results tab):

SELECT islandid, islandgeom FROM #islands
UNION ALL
SELECT bridgeid, bridgegeom.STBuffer(0.0001) FROM #bridges

image

How, then, can we try to devise a route that crosses each bridge only once? When investigating this problem, the first conclusion reached by Euler was that the shape of the islands, and the path taken within any island, was irrelevant. All that matters is the possible connections that can be made between islands. (This conclusion, and Euler’s later work on the puzzle, led to the foundation of graph theory, in which the entities in a network can be modelled as nodes, and the relationships between those nodes defined by edges).

So, let’s create a table that maps every possible crossing between islands, the edges of the network, by using STIntersects() to determine those islands that are connected to each bridge:

CREATE TABLE #crossings(
  fromid char(1),
  viabridge char(1),
  toid char(1)
);

INSERT INTO #crossings
SELECT 
  f.islandid AS fromid,
  b.bridgeid AS viabridge,
  t.islandid AS toid
FROM #islands f
  JOIN #bridges b ON f.islandgeom.STIntersects(b.bridgegeom) = 1
  JOIN #islands t ON b.bridgegeom.STIntersects(t.islandgeom) = 1
WHERE f.islandid != t.islandid
ORDER BY f.islandid;

Having created a table with every individual bridge crossing, we can now attempt to create a continuous route across the city by navigating through that edges table using a recursive CTE.  The anchor member of this query starts by looking at each individual bridge crossing from the #crossings table. Having crossed a bridge to a new island,  the recursive member then joins back to the #crossings table to search for any bridges that begin on the island where the last crossing ended, in an attempt to continue the route (note the join condition – #crossings JOIN cte ON cte.toid = #crossings.fromid).

Along the way, the query concatenates the ids of all the bridges crossed into the bridgescrossed field. This field is used in a condition of the recursive member to ensure we don’t cross any bridge that’s already been included at some point in the route (WHERE bridgescrossed NOT LIKE '%' + #crossings.viabridge + '%').

Finally, we also concatenate both the bridge identifier and the islands visited in turn to create a nice formatted route of the journey for later examination. e.g. a journey that started at island A and went to island D via bridge 3 would be shown as A – 3 – D.

Here’s the full code listing:

WITH cte as (
  SELECT
    #crossings.fromid,
    CAST(#crossings.viabridge AS varchar(32)) AS bridgescrossed,
    CAST(#crossings.fromid + '-' + #crossings.viabridge + '-' + #crossings.toid AS varchar(32)) as formattedroute,
    #crossings.toid
    FROM #crossings
  UNION ALL SELECT
    #crossings.fromid,
    CAST(cte.bridgescrossed + #crossings.viabridge AS varchar(32)) AS bridgescrossed,
    CAST(cte.formattedroute + '-' + #crossings.viabridge + '-' + #crossings.toid AS varchar(32)) as formattedroute,
    #crossings.toid
    FROM #crossings JOIN cte
    ON cte.toid = #crossings.fromid
    WHERE bridgescrossed NOT LIKE '%' + #crossings.viabridge + '%'
)
SELECT
  formattedroute,
  len(bridgescrossed) AS numbridgescrossed
FROM cte
ORDER BY len(bridgescrossed) DESC;

The results show every possible route across the city that never cross the same bridge more than once, ordered to show longest routes (i.e. those that cross the most bridges) first. If any route existed that crossed every bridge, we’d expect it to be listed at the top of the results, and for the numbridgescrossed value to be 7.

image

However, we find that no such route exists. The longest route(s) in the resultset crosses only 6 of the 7 bridges. One such route, starting at island D and ending at island A, is as follows:

D – 7 – C – 6 – B – 4 – D – 3 – A – 2 – B – 1 - A

And that is empirical proof, using SQL Server, that no solution to this problem exists.

As for understanding why no solution exists? Consider the number of bridges connected to each island. Bear in mind that every time you arrive on an island via a bridge, you must leave that island via a different bridge. It therefore follows that, in order to follow a continuous route that crosses every bridge,  every island (with the exception of the start and end island), must have an even number of bridges connected to it – a matched set of entry/exit bridge pairs. In the Konigsberg puzzle, however, Island B has 5 bridges connected to it, and Islands A, C, and D each have 3 bridges connected to them. Since, at most, only two of these islands can be the start/end points of the route, no solution is possible.

January 18, 2012

SQL Spatial Puzzle #2: The Hunter

(Click here for Puzzle #1 – the disappearing square)

There seem to be several variations on this puzzle, but the version I know is as follows:

“A hunter starts at a location. He walks one mile south, one mile east, and then one mile north. He ends up back where he started, and sees a bear. What colour is the bear?”

The Solution

Before worrying about the colour of the bear, we need to resolve the apparent paradox in the positioning of the hunter. He has moved one mile south, one mile east, and one mile north, and ended up back where he started. Where could he have been in order for this to be true?

The most commonly-known answer is that the hunter started at the North Pole, the bear is therefore a polar bear and is white (don’t let it worry you that there are no polar bears at the north pole!). Let’s test whether this answer fits the criteria using SQL Server’s geography datatype.

Firsty, we define the hunter’s start point, @p0, at the North Pole. The latitude of the North Pole is 90 degrees. All lines of longitude converge at the North Pole, so you can use any longitude value. I’ll use an arbitrary longitude value of 0:

DECLARE @p0 geography = geography::Point(90, 0, 4326);

We now need to declare the first point the hunter travels to, @p1, which is one mile due South of the start.

  • Travelling south reduces latitude, so @p1.Lat < @p0.Lat.
  • Travelling due south involves maintaining a constant longitude, so @p1.Long = @p0.Long.
  • The coordinates in this example use SRID 4326, so the results of any linear calculations are stated in metres. 1 mile = 1,609 metres (approx.), so we need @p0.STDistance(@p1) = 1609.
  • Solving these criteria leads to the following point:
DECLARE @p1 geography = geography::Point(89.98559, 0, 4326);

The next point that the hunter travels to lies one mile due east of @p1. Therefore, it must lie on the same latitude as @p1, but have a greater longitude (assuming that the antimeridian is not crossed). Using the same logic for the previous point, we can deduce that this point lies at:

DECLARE @p2 geography = geography::Point(89.98559, 60, 4326);

Finally, we need to establish the final point, @pf, which must lie one mile due north of the last point, as follows:

DECLARE @pf geography = geography::Point(90, 60, 4326);

To meet the final criteria of the puzzle, the final point must be the same as the point at which the hunter stated, so @pf = @p0.

We can prove that the points described above meet the criteria for the puzzle as follows:

DECLARE @p0 geography = geography::Point(90, 0, 4326);
DECLARE @p1 geography = geography::Point(89.98559, 0, 4326);
DECLARE @p2 geography = geography::Point(89.98559, 60, 4326);
DECLARE @pf geography = geography::Point(90, 60, 4326);

IF @p1.Lat  @p1.Long
AND CAST(@p2.STDistance(@p1) AS INT) = 1609
SELECT '@p2 lies one mile due east of @p1';

IF @pf.Lat > @p2.Lat
AND @pf.Long = @p2.Long
AND CAST(@pf.STDistance(@p2) AS INT) = 1609
SELECT '@pf lies one mile due north of @p2';

IF @pf.STEquals(@p0) = 1
SELECT '@pf is the same as @p0';

This proves that the route p0, p1, p2, pf meets all the criteria described in the riddle and, using the methods of the geography datatype, we have proven that the North Pole is a valid solution to this puzzle. However, it is not the only solution - there are, in fact, an infinite number of locations on the earth's surface that would meet the criteria of the puzzle...

...consider the second step of the hunter's journey - walking due east along a parallel of the earth - a line of constant latitude. As you approach the poles, the circumference of the parallels of the earth become increasingly smaller. Starting at a point just north of the south pole, it is possible to walk east right around the earth along a circle that is one mile in circumference. This occurs at a latitude of about -89.9977, as can be shown by the following:

DECLARE @linestring geography = geography::STLineFromText('
LINESTRING(0 -89.9977, 15 -89.9977, 30 -89.9977, 45 -89.9977, 60 -89.9977, 75 -89.9977, 90 -89.9977, 105 -89.9977, 115 -89.9977, 120 -89.9977, 135 -89.9977, 150 -89.9977, 165 -89.9977, 180 -89.9977, 195 -89.9977, 200 -89.9977, 210 -89.9977, 225 -89.9977, 240 -89.9977, 255 -89.9977, 270 -89.9977, 285 -89.9977, 300 -89.9977, 315 -89.9977, 330 -89.9977, 345 -89.9977, 360 -89.9977)
', 4326);

SELECT 
@linestring,
@linestring.STLength();

This linestring is 1,609 metres long - one mile, but follows a line of constant latitude right around the earth. So, if the hunter starts at any point that is one mile north of this line, they would first walk one mile south onto the line, then one mile east along the line (taking them once around the earth), and finally one mile north again to return to their original starting point.

The following code illustrates an example of just one such point that lies one mile north of the line:

DECLARE @q0 geography = geography::Point(-89.98329, 0, 4326);
SELECT @q0.STDistance(@linestring);

Therefore, an alternative solution to the puzzle could be as follows:

DECLARE @q0 geography = geography::Point(-89.98329, 0, 4326);
DECLARE @q1 geography = geography::Point(-89.9977, 0, 4326);
DECLARE @q2 geography = geography::Point(-89.9977, 360, 4326);
DECLARE @qf geography = geography::Point(-89.98329, 360, 4326);

It’s harder to prove this solution with SQL Server because, since walking east along the line between @q1 and @q2 takes you on an entire revolution of the earth, the result reported by STDistance() for the distance between  the points will always be zero (rather than the 1,609 metres it actually took you to walk there).

In fact, not only are there an infinite number of points that lie one mile north of the line at constant latitude –89.9977 that satisfy the puzzle, but there are also an infinite number of lines along which it can be done: Consider the line that goes around the South pole that is 1/2 mile in length, for example, or 1/3 mile. Walking due east for one mile along any of these lines will take you on some number of rotations of the earth but always return you to the starting point. In all these cases, the middle step of "walking due east for one mile" effectively becomes nullified, leaving the puzzle as walking due south for one mile and then due north for one mile, which clearly returns you to the start location.

January 16, 2012

SQL Spatial Puzzle #1: The Disappearing Square

This is the first in what will (hopefully) be a series of posts demonstrating a few light-hearted uses of SQL Server 2008 spatial functions to solve some common mathematical/logical puzzles. To demonstrate the disappearing square puzzle, first of all create a table and insert four simple geometry shapes into it, as follows:

DECLARE @table1 TABLE( id CHAR(1), shape geometry );
INSERT INTO @table1 VALUES
('A','POLYGON((0 0, 8 0, 8 3, 0 0))'),
('B','POLYGON((8 0, 13 0, 13 2, 10 2, 10 1, 8 1, 8 0))'),
('C','POLYGON((8 3, 8 1, 10 1, 10 2, 13 2, 13 3, 8 3))'),
('D','POLYGON((8 3, 13 3, 13 5, 8 3))');

If you now select all of the shapes from the table and switch to the spatial results tab in SQL Server Management Studio, you can see that the individual shapes fit together to form a triangle:

SELECT id, shape
FROM @table1;

image

Now suppose that we insert these same shapes into another table, but move them around slightly:

DECLARE @table2 TABLE ( id CHAR(1), shape geometry );

INSERT INTO @table2 VALUES
('A','POLYGON((5 2, 13 2, 13 5, 5 2))'),
('B','POLYGON((8 0, 13 0, 13 2, 10 2, 10 1, 8 1, 8 0))'),
('C','POLYGON((5 2, 5 0, 7 0, 7 1, 10 1, 10 2, 5 2))'),
('D','POLYGON((0 0, 5 0, 5 2, 0 0))');

SELECT id, shape
FROM @table2;

Note that all of these shapes are identical to those used in the first example - they've just been rearranged. However, the triangle formed when the pieces are fitted together now contains a "hole" - 1 unit wide by 1 unit high.

 image

So, where has the hole appeared from, and what is missing?

The Solution

The area of a right-angled triangle can be calculated as half of the base multiplied by the height. Both 'triangles' appear to be 13 units wide x 5 units high, so we would expect them to have an area of 32.5 units. Let's test this:

SELECT SUM(shape.STArea()) FROM @table1;
SELECT SUM(shape.STArea()) FROM @table2;

The result of the above query shows that the total area of the individual shapes contained in both triangles is the same - 32 units, not 32.5 as expected. The area of the bottom triangle, when including the 'missing' one unit square, is therefore 33 units.

In other words, although the two triangles appear similar, they are not the same, and neither one is the 13x5 right-angled triangle they appear to be. The first triangle is slightly smaller, and the second triangle is slightly larger.

To investigate further, we can find the area of difference between these triangles and the "true" 13 x 5 right-angled triangle defined as follows:

DECLARE @truetriangle geometry = geometry::STPolyFromText('POLYGON((0 0, 13 0, 13 5, 0 0))', 0);

First, we need to create a union aggregate of all the component shapes that form each triangle, and then compare this to the true triangle using the STSymDifference() method (note I also use the STConvexHull() method on the second triangle in order to fill in the missing 'hole'). If you’re using SQL 2012, you can use the UnionAggregate() method here but, if not, you can create a simple union aggregate using STUnion() as shown below:

-- Create union of all individual shapes in first triangle
DECLARE @triangle1 geometry = 'GEOMETRYCOLLECTION EMPTY';
SELECT @triangle1 = @triangle1.STUnion(shape) FROM @table1; 
 
-- Create union of all individual shapes in second triangle (filling in the gap)
DECLARE @triangle2 geometry = 'GEOMETRYCOLLECTION EMPTY';
SELECT @triangle2 = @triangle2.STUnion(shape).STConvexHull() FROM @table2; 
 
-- Work out the difference between each complete triangle and the true 13 x 5 triangle

SELECT
  @triangle1.STDifference(@truetriangle),
  @triangle1.STDifference(@truetriangle).STArea()
UNION ALL SELECT
  @triangle2.STDifference(@truetriangle),
  @triangle2.STDifference(@truetriangle).STArea();


The results illustrate two thin slivers of area where the two shapes differ from the true triangle, lying along the hypotenuse – each 0.5 units in area.

image

While both shapes formed from the component elements A, B, C, and D appear to be right-angled triangles, they are in fact polygons, since the hypotenuse of each is not a straight line. (You can check this by calling STNumPoints() on both @triangle1 and @triangle2 – you’ll see that each contains four distinct points). The hypotenuse of the @triangle1 is slightly concave, hence why the area is only 32 units and not 32.5 units as expected. The hypotenuse of the @triangle2 is slightly convex, leading to a total area of 33 units. When you add in the missing sliver to @triangle1, or take it away from @triangle2, you’ll end up with the true 13x5 triangle as expected.

The illusion is caused by the fact that the gradients of the component triangles A and D are not the same, therefore swapping them over affects the overall area of the shape. However, they are similar enough that your brain doesn't perceive the difference between them, and assumes that you're looking at a right-angled triangle in both cases.

Tags: , ,
Follow

Get every new post delivered to your Inbox.

Join 53 other followers