Posts tagged ‘SQL’

March 11, 2012

“Point-of-Maximal Distance” queries (AKA Variations on a Theme of Nearest-Neighbours)

Nearest neighbour queries (or, for my international readers, nearest-neighbor queries), are a pretty common query pattern used in location-based applications and spatial analysis:- given a set of known locations of restaurants, customers, stations etc.,  “What/where is the nearest X to a given location?” I’ve already written about performing nearest-neighbour queries with SQL Server spatial datatypes here and here.

Occasionally, you’ll see this pattern reversed – “What is the X furthest away from my current location?” – which could be referred to as a farthest-neighbour query.

However, there’s another variant that I’ve not often seen discussed, which is: “Given a set of known locations, where is the point on Earth that maximises the distance to the closest X.” In other words, where would you have to go to get as far away as possible from any item in the given set of data? The key difference to appreciate about this query structure is that, unlike the first two examples, the answer is not selected from one of a set of known locations – it is location that must be determined programmatically to maximise the distance from X. This makes the query a lot more complicated, but also more interesting. (I don’t know if there is a generic name for this type of query or not…  a “point-of-maximal distance” query?).

I can think of several potential practical applications for this type of query – a fire response service may want to identify the location at increased risk because it is furthest away from any of their response stations, for example. Or, a mobile telecommunications operator might want to identify the “darkest spot” in their network coverage – the point that lies furthest from any of their transmitters.

As ever, I think a practical example might help demonstrate. One of my recent posts described how to import a set of point data representing Tescos stores in the UK. People that know me will know that I’m a Green, tree-hugging, anti-capitalist hippie, so let’s suppose that I wanted to move house to get as far away from Tescos as possible (for my international readers, Tesco is a multi-billion pound supermarket chain in the UK often presented as one of the faces of modern global capitalism – I think you have similar companies like Wal-Mart or Subway).

I’m going to set an additional constraint that, while I want to live far from Tescos, I still want to live in mainland UK. So, my problem is to find the point that lies somewhere inside this Polygon that is furthest away from any of the dots (which are the locations of Tescos stores):

image

So…. how do that?

Method 1: Iterative T-SQL

To start, I’m going to create the shape of the UK as a geography MultiPolygon called @UK. I’ve already got a table containing various parts of the UK that I can aggregate, but I want to exclude the Scottish Islands and other small polygon geometries that will confuse matters (Scottish islands = good for single malt whiskey, bad for spatial data… they’re just too craggy!). So, I’ll create a simple numbers table in a CTE and join to that table in order to loop through each element in the UK table and only aggregate those polygons that are greater than 10,000km2 in area:

DECLARE @UK geography;

-- Create a MultiPolygon of the UK, omitting any islands
-- with area less than 10,000km2
WITH Numbers(n) AS ( 
  SELECT ROW_NUMBER() OVER (ORDER BY number) FROM master..spt_values
)
SELECT @UK = geography::UnionAggregate(Shape.STGeometryN(n))
FROM UK JOIN Numbers ON Numbers.n  10000000000;

I’ll also create a geography MultiPoint of all Tescos stores called @Tescos. The only slight complication here is that the Garmin POI dataset I’m using seems to have some erroneous data, so I’ll put a WHERE condition to filter only those records that lie within the approximate geographic bounds of the UK:

DECLARE @Tescos geography;

-- Create a MultiPoint of Tescos stores
SELECT @Tescos = geography::UnionAggregate(Location)
FROM TescoStores
WHERE Location.Long BETWEEN -10 AND 3
AND Location.Lat BETWEEN 50 AND 60;

Now that we’ve got the basics set up, how do we go about finding the part of @UK that lies furthest from @Tescos? To explain one approach, consider an analogy in which the shape of the UK is cut out of blotting paper. You then arrange a set of pipettes filled with ink, so that each is positioned above the location of a Tescos store. And then, all at the same time, you start dropping ink from the pipettes onto the location of the stores.

The ink blots centred on the location of each store will expand outwards at the same rate, gradually filling up the blotting paper shape of the UK. The last part of the paper to be covered by ink is that which lies furthest from any of the ink sources, and must be the solution.

We can recreate this approach using SQL Server’s spatial functions:

  • To mimic the expanding ink blots we can call STBuffer() on the @Tescos MultiPoint with increasing buffer sizes to create expanding circular zones centred around each store.
  • To determine the area of blotting paper that is still not covered by ink, we can use STDifference() to compare the inked area to the original shape.
  • If the part of the UK yet to be covered by ink is still of a sufficiently large area, we simply add more ink by looping through again with an increased buffer size. If, however, the whole UK has been covered by ink then we’ve put too much ink on and we need to reduce the buffer size slightly.
  • When the area yet to be covered reaches a sufficiently small size (this approach is unlikely to ever converge on a single Point), declare that to be the solution.

Here’s the query:

DECLARE @x float = 100000; -- Declare an initial buffer size. I'm guessing that the furthest point in the UK lies more than 100km from Tescos
WHILE @UK.STDifference(@Tescos.STBuffer(@x)).STArea() > 10000 -- Stop when we find an area less than 10km2 in area
BEGIN

  -- Print a little debug message to see what's happening on this iteration
  PRINT CAST(@UK.STDifference(@Tescos.STBuffer(@x)).STArea() AS varchar(32)) + 
        'm2 of the UK lies more than ' + CAST(@x AS varchar(32)) + 'm from the closest Tescos';

  -- Amount by which to increase the buffer size
  DECLARE @y float;

  -- Fairly arbitrary, but enlarge the buffer by the square root of the remaining area
  -- Intention here is to close in on the appropriate buffer value relatively quickly
  SET @y = SQRT(@UK.STDifference(@Tescos.STBuffer(@x)).STArea());

  -- Check whether increasing the buffer by this amount would be too great
  -- and lead to the whole UK being covered in ink!
  WHILE @UK.STDifference(@Tescos.STBuffer(@x + @y)).STArea() = 0 BEGIN

    -- Reduce the amount by which the buffer is increased
    SET @y = @y/2;
  END

  -- Apply the increase in buffer
  SET @x = @x + @y;

END

Once this loop has finished, @x will contain a value, measured in metres, that represents the radius to which each of the ink blots must expand so that the area left uncovered has an area of less than 10km2 . Here’s what the map looks like when a circular buffer of radius @x is created centred on the location of each Tesco store:

SELECT @Tescos.STBuffer(@x)
UNION ALL SELECT @Tescos
UNION ALL SELECT @UK;

image

As you can tell, the whole of the UK is pretty much covered – the iteration only stops when the area remaining uncovered is less than 10km2 in in size (WHILE @UK.STDifference(@Tescos.STBuffer(@x)).STArea() > 10000), which means that, at this zoom level, it’s going to be pretty hard to perceive.

To find where the remaining un-inked area is, we can SELECT @UK.STDifference(@Tescos.STBuffer(@x)), and, to make it more obvious, I’ll also add a buffer to enlarge the solution area. And the answer? (as you might have been able to tell from the previous image) is the small blue polygon shown below:

image

Perhaps unsurprisingly, it’s an area at the tip of the highlands of Scotland, and it’s 119,591 metres from the closest Tesco store. Ok, so it’s not an exact point, but the way that this iteration works it would be very hard to reach such a precise location.

Method 2: SQLCLR Triangulation?

My gut instinct told me that there should be a way of coming up with the exact solution to the problem, in a much simpler manner, using triangulation. For example, suppose you were to create the Delaunay triangulation of the set of points such that the vertices of each triangle were formed from the location of the Tesco stores:

image

Since the triangles are formed by connecting vertices of Tesco stores, in those areas where stores are further apart, the circumcircle (the point of equal distance from all three vertices) of the triangle created between them will be larger. In fact, you’d think the point that lies at the centre of the triangle of largest circumradius would be the point that lies furthest from any Tesco store. And this would be true, were it not for (quite literally) “edge cases”…

… when, as in this case, the solution lies on the edge of the area in question, it is not covered by the extent of the triangulation (which only creates triangles that are interior to the set of data). Only by artificially extending the triangles out from the set of Tesco stores towards infinity could we ensure that the whole UK had been covered, and it turns out the SQL Server is not too happy about creating infinite-sized triangles (who’d have thought?).

So that’s a bit disappointing… the iterative T-SQL method still works and, with a bit of tweaking to choose how the buffer should be refined on each iteration, could probably be made to work more efficiently. However, I was hoping to be able to get something exact using SQLCLR.

I should just point out that the fact the triangulation method doesn’t extend beyond the bounds of the supplied dataset can still make it useful for lots of other purposes. For example, creating an aggregate union of the triangles creates a concave hull around the location of all Tescos stores in the UK (not a convex hull as could be created by STConvexHull()). The result looks scarily like the shape of the UK itself:

image

And, deriving a Voronoi tessellation from the set of triangles creates the “catchment area” around each store – the area formed from all those points closest to that store than any other. You can think of this as the materialised form of the results of a nearest-neighbour query:

image

(Note that, as expected, the point that lies furthest away from a Tescos store is contained within the largest Voronoi polygon – the pinkish one at the top of the map). At first, the Voronoi tessellation and the Delaunay triangulation may appear similar – indeed, they are “dual structures” – while Tesco stores form the vertices of each triangle in the Delaunay triangulation, they are the sites around which each polygon is created in the Voronoi tessellation.

If anybody can suggest any alternative approaches to identify the “point of maximal distance” from a set of data, I’d love to hear it…

Note: The SQLCLR code for creating the Delaunay triangulation, concave hull, and Voronoi tessellation demonstrated in this post is taken from “Pro Spatial with SQL Server 2012”, which also contains a chapter explaining these topics in more detail.

March 7, 2012

SQL Server 2012 – What’s New for Spatial?

So Microsoft launched SQL Server 2012 today, and you can download the SQL 2012 Express edition right now from
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx

No doubt there will be lots of people writing blog articles about the various new features – AlwaysOn, Column Store , Power View etc. but I thought I’d stick to what I know and write about what’s new for spatial.

In fact, I don’t really even need to do that, because there’s always a pretty good whitepaper about that from the SQLCAT website, here. This whitepaper is written for Denali CTP3 but, to my knowledge (and from my limited playing around with RTM this morning), there don’t appear to be any new features introduced between then and now.

There’s also already a very useful table here that compares the spatial features currently available in SQL Azure to those in SQL Server 2012.

SQL Server 2012 Spatial Was My Idea

So, what’s left to do? Well – I might start by mentioning the fact that I’m quite chuffed about some of the new features that have been included. Do you remember the Windows 7 advertising slogan with people saying things they did with their PC and that “Windows 7 was my idea”? Well, that’s how a little bit like how I feel about SQL 2012:

I’m not suggesting that these features were introduced solely to appease my personal whims, but don’t let it ever be said that Microsoft don’t listen to customer feedback!

Upgrading a Spatial Application from SQL Server 2008/R2?

Finally, some words of warning for anyone upgrading existing spatial databases from SQL Server 2008/R2 to SQL Server 2012:

Remember to account for Curved Geometries!

The introduction of curved geometry datatypes mean that the range of possible geometries that can be returned by existing methods is increased, and you may need to write additional code paths to deal with that. Take the STConvexHull() method, for example, which returns the convex hull around a geometry. In SQL Server 2008/R2, that method would always return a Polygon (other than in the special case in which all the input points lay on a straight line, in which case it returned a LineString). In SQL 2012, however, if any of the inputs are themselves curved geometries, the convex hull returned by STConvexHull() can be a CurvePolygon, e.g.:

DECLARE @g geometry = 'GEOMETRYCOLLECTION(CIRCULARSTRING(0 0, -5 5, 0 10), CIRCULARSTRING(10 10, 15 10, 10 0))';
SELECT
  @g.STConvexHull(),
  @g.STConvexHull().STAsText();

The result of STConvexHull() in this example is:

CURVEPOLYGON (COMPOUNDCURVE (CIRCULARSTRING (18.090169943749487 5.0000000000000213, 16.351988430497684 9.051195518771241, 12.240554713995163 10.584146142748239), (12.240554713995163 10.584146142748239, -0.23001591065449833 9.9947064659342839), CIRCULARSTRING (-0.23001591065449833 9.9947064659342839, -4.99999999999994 5.0000000000000213, -0.23001591065449833 0.00529353406575872), (-0.23001591065449833 0.00529353406575872, 12.240554713995163 -0.5841461427481951), CIRCULARSTRING (12.240554713995163 -0.5841461427481951, 16.351988430497684 0.94880448122880079, 18.090125812525294 4.9777873878621213), (18.090125812525294 4.9777873878621213, 18.090169943749487 5.0000000000000213)))

image

Calculated Results may differ due to Increased Precision

SQL Server 2012 now uses 48 bit precision for spatial calculations rather than 27 bit as under SQL Server 2008/R2. This means that results of certain spatial queries will differ between those obtained under SQL Server 2008/R2. This has been mentioned before, but it really is worth iterating because it’s both unusual and potentially application-breaking behaviour.

The following very simple example demonstrates the issue:

DECLARE @line1 geometry = 'LINESTRING(0 11, 430 310)';
DECLARE @line2 geometry = 'LINESTRING(0 500, 650 0)';

SELECT @line1.STIntersection(@line2).STIntersects(@line1);
  • When executed under SQL Server 2008/R2, you’ll get the result 0.
  • Upgrade to SQL Server 2012, and you’ll suddenly get the result 1.

For a method that returns a bit value, that’s pretty much the greatest difference it’s possible to get…. so be sure to check your code (especially if you ever rely on exact equality testing between instances).

March 6, 2012

Drawing Fractals with SQL Server Spatial

I can’t think of any practical purpose for this code, but here’s a recursive SQLCLR procedure to draw a Sierpinski triangle fractal (or, at least, an approximation of one to a given level of detail) in SQL Server just because you can.

Adapted from
http://www.codeproject.com/KB/silverlight/SierpinskiTriangle.aspx

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Types;
using System.Data.SqlTypes;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;
using System.Data;

namespace ProSpatial
{
  public partial class StoredProcedures
  {
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void SierpinskiTriangle(int size)
    {

      // Set properties of exterior equilateral triangle
      int w = size;  // Width. e.g. 512
      int h = (int)(w * Math.Sqrt(3) / 2);  // Height
      int[] x = { 0, w, w/2 };  // x vertices
      int[] y = { h, h, 0 };  // y vertices

      // Create a new SqlGeometry Instance to hold the output
      SqlGeometry Triangles = new SqlGeometry();
      Triangles.STSrid = 0;

      // Start recursion
      Triangles = drawSierpinskiTriangle(x, y, w/2, 2, Triangles);

      // Send the results back to SQL Server
      SendResults(Triangles);
    }


    private static SqlGeometry drawSierpinskiTriangle(int[] x, int[] y, int d, int dMin, SqlGeometry Triangles)
    {

      // If triangles are too small to render then make this the last recursion
      if (d <= dMin)
      {
        // Create a new triangle and add it to the collection
        SqlGeometry Polygon = TriangleFromPoints(x[0], y[0], x[1], y[1],  x[2], y[2]);
        Triangles = Triangles.STUnion(Polygon);
      }
      else
      {
        // Calculate centre of each side
        int xMc = (x[0] + x[1]) / 2, yMc = (y[0] + y[1]) / 2;
        int xMb = (x[0] + x[2]) / 2, yMb = (y[0] + y[2]) / 2;
        int xMa = (x[1] + x[2]) / 2, yMa = (y[1] + y[2]) / 2;

        // Subdivide into three new triangles
        int[] xNew1 = { x[0], xMc, xMb };
        int[] yNew1 = { y[0], yMc, yMb };
        Triangles = drawSierpinskiTriangle(xNew1, yNew1, d / 2, dMin, Triangles);

        int[] xNew2 = { x[1], xMc, xMa };
        int[] yNew2 = { y[1], yMc, yMa };
        Triangles = drawSierpinskiTriangle(xNew2, yNew2, d / 2, dMin, Triangles);

        int[] xNew3 = { x[2], xMb, xMa };
        int[] yNew3 = { y[2], yMb, yMa };
        Triangles = drawSierpinskiTriangle(xNew3, yNew3, d / 2, dMin, Triangles);
      }

      // Recursion finished - return the result
      return Triangles;
    }

    // Send the results back to the client
    private static void SendResults(SqlGeometry Triangles)
    {
      // Define the metadata of the results column
      SqlMetaData metadata = new SqlMetaData("Triangle", SqlDbType.Udt, typeof(SqlGeometry));

      // Create a record based on this metadata
      SqlDataRecord record = new SqlDataRecord(metadata);
      record.SetValue(0, Triangles);

      // Send the results back to the client
      SqlContext.Pipe.Send(record);
    }

    // Construct a triangle from 3 vertices
    private static SqlGeometry TriangleFromPoints(double x0, double y0, double x1, double y1, double x2, double y2)
    {
      SqlGeometryBuilder TriangleBuilder = new SqlGeometryBuilder();
      TriangleBuilder.SetSrid(0);
      TriangleBuilder.BeginGeometry(OpenGisGeometryType.Polygon);
      TriangleBuilder.BeginFigure(x0, y0);
      TriangleBuilder.AddLine(x1, y1);
      TriangleBuilder.AddLine(x2, y2);
      TriangleBuilder.AddLine(x0, y0);
      TriangleBuilder.EndFigure();
      TriangleBuilder.EndGeometry();
      return TriangleBuilder.ConstructedGeometry;
    }

  }
}

Import the assembly, register the procedure, and then execute in SQL Server. It requires a single parameter, size, which determines the overall scale of the result.

/* Import Assembly */
CREATE ASSEMBLY SierpinskiTriangle
FROM 'c:\users\alastair\documents\SierpinskiTriangle.dll'
WITH PERMISSION_SET = SAFE;
GO

/* Register function */
CREATE PROCEDURE dbo.SierpinskiTriangle(@size int)
AS EXTERNAL NAME SierpinskiTriangle.[ProSpatial.StoredProcedures].SierpinskiTriangle;
GO

/* Execute Procedure */
EXEC dbo.SierpinskiTriangle 512;

When executed with a size of 512, the result is a single MultiPolygon instance containing a set of Sierpinski triangles as follows:

image

January 16, 2012

SQL Spatial Puzzle #1: The Disappearing Square

This is the first in what will (hopefully) be a series of posts demonstrating a few light-hearted uses of SQL Server 2008 spatial functions to solve some common mathematical/logical puzzles. To demonstrate the disappearing square puzzle, first of all create a table and insert four simple geometry shapes into it, as follows:

DECLARE @table1 TABLE( id CHAR(1), shape geometry );
INSERT INTO @table1 VALUES
('A','POLYGON((0 0, 8 0, 8 3, 0 0))'),
('B','POLYGON((8 0, 13 0, 13 2, 10 2, 10 1, 8 1, 8 0))'),
('C','POLYGON((8 3, 8 1, 10 1, 10 2, 13 2, 13 3, 8 3))'),
('D','POLYGON((8 3, 13 3, 13 5, 8 3))');

If you now select all of the shapes from the table and switch to the spatial results tab in SQL Server Management Studio, you can see that the individual shapes fit together to form a triangle:

SELECT id, shape
FROM @table1;

image

Now suppose that we insert these same shapes into another table, but move them around slightly:

DECLARE @table2 TABLE ( id CHAR(1), shape geometry );

INSERT INTO @table2 VALUES
('A','POLYGON((5 2, 13 2, 13 5, 5 2))'),
('B','POLYGON((8 0, 13 0, 13 2, 10 2, 10 1, 8 1, 8 0))'),
('C','POLYGON((5 2, 5 0, 7 0, 7 1, 10 1, 10 2, 5 2))'),
('D','POLYGON((0 0, 5 0, 5 2, 0 0))');

SELECT id, shape
FROM @table2;

Note that all of these shapes are identical to those used in the first example - they've just been rearranged. However, the triangle formed when the pieces are fitted together now contains a "hole" - 1 unit wide by 1 unit high.

 image

So, where has the hole appeared from, and what is missing?

The Solution

The area of a right-angled triangle can be calculated as half of the base multiplied by the height. Both 'triangles' appear to be 13 units wide x 5 units high, so we would expect them to have an area of 32.5 units. Let's test this:

SELECT SUM(shape.STArea()) FROM @table1;
SELECT SUM(shape.STArea()) FROM @table2;

The result of the above query shows that the total area of the individual shapes contained in both triangles is the same - 32 units, not 32.5 as expected. The area of the bottom triangle, when including the 'missing' one unit square, is therefore 33 units.

In other words, although the two triangles appear similar, they are not the same, and neither one is the 13x5 right-angled triangle they appear to be. The first triangle is slightly smaller, and the second triangle is slightly larger.

To investigate further, we can find the area of difference between these triangles and the "true" 13 x 5 right-angled triangle defined as follows:

DECLARE @truetriangle geometry = geometry::STPolyFromText('POLYGON((0 0, 13 0, 13 5, 0 0))', 0);

First, we need to create a union aggregate of all the component shapes that form each triangle, and then compare this to the true triangle using the STSymDifference() method (note I also use the STConvexHull() method on the second triangle in order to fill in the missing 'hole'). If you’re using SQL 2012, you can use the UnionAggregate() method here but, if not, you can create a simple union aggregate using STUnion() as shown below:

-- Create union of all individual shapes in first triangle
DECLARE @triangle1 geometry = 'GEOMETRYCOLLECTION EMPTY';
SELECT @triangle1 = @triangle1.STUnion(shape) FROM @table1; 
 
-- Create union of all individual shapes in second triangle (filling in the gap)
DECLARE @triangle2 geometry = 'GEOMETRYCOLLECTION EMPTY';
SELECT @triangle2 = @triangle2.STUnion(shape).STConvexHull() FROM @table2; 
 
-- Work out the difference between each complete triangle and the true 13 x 5 triangle

SELECT
  @triangle1.STDifference(@truetriangle),
  @triangle1.STDifference(@truetriangle).STArea()
UNION ALL SELECT
  @triangle2.STDifference(@truetriangle),
  @triangle2.STDifference(@truetriangle).STArea();


The results illustrate two thin slivers of area where the two shapes differ from the true triangle, lying along the hypotenuse – each 0.5 units in area.

image

While both shapes formed from the component elements A, B, C, and D appear to be right-angled triangles, they are in fact polygons, since the hypotenuse of each is not a straight line. (You can check this by calling STNumPoints() on both @triangle1 and @triangle2 – you’ll see that each contains four distinct points). The hypotenuse of the @triangle1 is slightly concave, hence why the area is only 32 units and not 32.5 units as expected. The hypotenuse of the @triangle2 is slightly convex, leading to a total area of 33 units. When you add in the missing sliver to @triangle1, or take it away from @triangle2, you’ll end up with the true 13x5 triangle as expected.

The illusion is caused by the fact that the gradients of the component triangles A and D are not the same, therefore swapping them over affects the overall area of the shape. However, they are similar enough that your brain doesn't perceive the difference between them, and assumes that you're looking at a right-angled triangle in both cases.

Tags: , ,
Follow

Get every new post delivered to your Inbox.

Join 53 other followers