Posts tagged ‘SQL Server’

March 6, 2012

Drawing Fractals with SQL Server Spatial

I can’t think of any practical purpose for this code, but here’s a recursive SQLCLR procedure to draw a Sierpinski triangle fractal (or, at least, an approximation of one to a given level of detail) in SQL Server just because you can.

Adapted from
http://www.codeproject.com/KB/silverlight/SierpinskiTriangle.aspx

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Types;
using System.Data.SqlTypes;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;
using System.Data;

namespace ProSpatial
{
  public partial class StoredProcedures
  {
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void SierpinskiTriangle(int size)
    {

      // Set properties of exterior equilateral triangle
      int w = size;  // Width. e.g. 512
      int h = (int)(w * Math.Sqrt(3) / 2);  // Height
      int[] x = { 0, w, w/2 };  // x vertices
      int[] y = { h, h, 0 };  // y vertices

      // Create a new SqlGeometry Instance to hold the output
      SqlGeometry Triangles = new SqlGeometry();
      Triangles.STSrid = 0;

      // Start recursion
      Triangles = drawSierpinskiTriangle(x, y, w/2, 2, Triangles);

      // Send the results back to SQL Server
      SendResults(Triangles);
    }


    private static SqlGeometry drawSierpinskiTriangle(int[] x, int[] y, int d, int dMin, SqlGeometry Triangles)
    {

      // If triangles are too small to render then make this the last recursion
      if (d <= dMin)
      {
        // Create a new triangle and add it to the collection
        SqlGeometry Polygon = TriangleFromPoints(x[0], y[0], x[1], y[1],  x[2], y[2]);
        Triangles = Triangles.STUnion(Polygon);
      }
      else
      {
        // Calculate centre of each side
        int xMc = (x[0] + x[1]) / 2, yMc = (y[0] + y[1]) / 2;
        int xMb = (x[0] + x[2]) / 2, yMb = (y[0] + y[2]) / 2;
        int xMa = (x[1] + x[2]) / 2, yMa = (y[1] + y[2]) / 2;

        // Subdivide into three new triangles
        int[] xNew1 = { x[0], xMc, xMb };
        int[] yNew1 = { y[0], yMc, yMb };
        Triangles = drawSierpinskiTriangle(xNew1, yNew1, d / 2, dMin, Triangles);

        int[] xNew2 = { x[1], xMc, xMa };
        int[] yNew2 = { y[1], yMc, yMa };
        Triangles = drawSierpinskiTriangle(xNew2, yNew2, d / 2, dMin, Triangles);

        int[] xNew3 = { x[2], xMb, xMa };
        int[] yNew3 = { y[2], yMb, yMa };
        Triangles = drawSierpinskiTriangle(xNew3, yNew3, d / 2, dMin, Triangles);
      }

      // Recursion finished - return the result
      return Triangles;
    }

    // Send the results back to the client
    private static void SendResults(SqlGeometry Triangles)
    {
      // Define the metadata of the results column
      SqlMetaData metadata = new SqlMetaData("Triangle", SqlDbType.Udt, typeof(SqlGeometry));

      // Create a record based on this metadata
      SqlDataRecord record = new SqlDataRecord(metadata);
      record.SetValue(0, Triangles);

      // Send the results back to the client
      SqlContext.Pipe.Send(record);
    }

    // Construct a triangle from 3 vertices
    private static SqlGeometry TriangleFromPoints(double x0, double y0, double x1, double y1, double x2, double y2)
    {
      SqlGeometryBuilder TriangleBuilder = new SqlGeometryBuilder();
      TriangleBuilder.SetSrid(0);
      TriangleBuilder.BeginGeometry(OpenGisGeometryType.Polygon);
      TriangleBuilder.BeginFigure(x0, y0);
      TriangleBuilder.AddLine(x1, y1);
      TriangleBuilder.AddLine(x2, y2);
      TriangleBuilder.AddLine(x0, y0);
      TriangleBuilder.EndFigure();
      TriangleBuilder.EndGeometry();
      return TriangleBuilder.ConstructedGeometry;
    }

  }
}

Import the assembly, register the procedure, and then execute in SQL Server. It requires a single parameter, size, which determines the overall scale of the result.

/* Import Assembly */
CREATE ASSEMBLY SierpinskiTriangle
FROM 'c:\users\alastair\documents\SierpinskiTriangle.dll'
WITH PERMISSION_SET = SAFE;
GO

/* Register function */
CREATE PROCEDURE dbo.SierpinskiTriangle(@size int)
AS EXTERNAL NAME SierpinskiTriangle.[ProSpatial.StoredProcedures].SierpinskiTriangle;
GO

/* Execute Procedure */
EXEC dbo.SierpinskiTriangle 512;

When executed with a size of 512, the result is a single MultiPolygon instance containing a set of Sierpinski triangles as follows:

image

March 2, 2012

Cleaning up artefacts created by Reduce()

The Reduce() method simplifies geography or geometry instances by removing vertices. Depending on the situation, this can sometimes cause a Polygon instance to collapse in on itself, degenerating into a LineString, or even a single Point.

For example, in the following code listing I’ve simplified a MultiPolygon instance of the United Kingdom using the Reduce() methods with a tolerance of 2km:

SELECT
@UK.Reduce(2000) AS Shape;

This causes many of the Scottish islands and some other parts of coastline to degenerate into singular Point instances, which are shown as black specks in the SSMS spatial results tab, as follows:

Artefacts

Instead of being a MultiPolygon instance, the shape is now a GeometryCollection containing a variety of geometry types, but this is often not what you really want.

Assuming that what you really wanted was only those simplified Polygons returned by Reduce(), you can use the following T-SQL function to loop through a geometry and removes any Point or LineString elements, returning only the union of all polygonal elements (i.e. those elements where STDimension() = 2):

CREATE FUNCTION RemoveArtefacts(@g geography) RETURNS geography AS BEGIN
  DECLARE @h geography = geography::STGeomFromText('POINT EMPTY', @g.STSrid);
  DECLARE @i int = 1;
  WHILE @i <= @g.STNumGeometries() BEGIN
    IF(@g.STGeometryN(@i).STDimension() = 2) BEGIN
      SELECT @h = @h.STUnion(@g.STGeometryN(@i));
    END
    SET @i = @i + 1;
  END
  RETURN @h;
END;

Usage as follows:

SELECT
dbo.RemoveArtefacts(@UK.Reduce(2000)) AS Shape;

The map is now simplified as before, but does not contain the unwanted degenerate artifacts created as a result of vertex removal:

Artefacts2

February 22, 2012

Updating a SQL Server Table with Nearest Neighbours (Optimised for SQL Server 2012 / SQL Azure)

So I’ve been trying to think of some examples that make use of the Garmin POI data from my last post. Seeing as POI data naturally lends itself to nearest-neighbour queries, I thought I’d write a little about those.

The general form of the nearest neighbor query is: “Show me the closest x restaurants/pubs/train stations to this location”; this is the query pattern used by location-aware applications to find POIs close to the user’s current location, for example. However, another pattern of which I’ve seen much fewer examples is how to update an entire table of records in order to determine and populate the nearest neighbour for each row in the table (indeed, this very question came up on the MSDN spatial forum just last week).

With this in mind, I thought I’d demonstrate a practical scenario: let’s say you’re planning going for a swim at the local pool and, after you’ve swum 50 lengths, you’ll probably feel a bit peckish – perhaps in the need for some fish ‘n’ chips. So, in this example I’m firstly going to import the Garmin POI data for swimming pools and fish ‘n’ chip shops in the UK, each being represented by a geography Point instance. I’ll then add an extra column to the swimming pool table and populate it with the closest fish and chip shop to every swimming pool.

To start with, create tables containing the data for fish and chip shops and swimming pools (download the data from here). Notice that I’ve included an IDENTITY primary key field in each table, which I’ll need in order to add a spatial index later:

SELECT
  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO SwimmingPools
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Swimming Pools.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS SwimmingPools;
GO

ALTER TABLE SwimmingPools ADD PRIMARY KEY(Id);
GO

SELECT
  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO FishAndChips
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Fish and Chips.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS FishAndChips;
GO

ALTER TABLE FishAndChips ADD PRIMARY KEY(Id);
GO

Now we’ll add some extra columns to the Swimming Pools table, which we’ll populate with information about the closest fish ‘n’ chip shop:

ALTER TABLE SwimmingPools
ADD ClosestFishbar varchar(255),
    Distance decimal(18,2);

Now, you don’t need to add a spatial index to perform the update query, but it’ll be a lot faster with one than without. SQL Server can only make use of one spatial index in a join between tables, so there’s no need to add an index to both tables. We’ll just add one to the Fish And Chips table, as follows:

CREATE SPATIAL INDEX sidx_FishAndChips ON FishAndChips(Location)
USING  GEOGRAPHY_GRID 
WITH (
  GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH), 
  CELLS_PER_OBJECT = 16
);

Then, run an UPDATE query to populate the ClosestFishBar and Distance columns, as follows:

UPDATE s
SET ClosestFishbar = fName,
    Distance = Location.STDistance(fLocation)  
FROM (
  SELECT
    SwimmingPools.*,
    fnc.Name AS fName,
    fnc.Location AS fLocation
  FROM SwimmingPools 
  CROSS APPLY (
    SELECT TOP 1
     Name,
     Location
    FROM FishAndChips WITH(index(sidx_FishAndChips))
    WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL
    ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC
  ) fnc
) s;

This query probably needs some explanation:

  • Generally, you might expect to use a correlated subquery to obtain the closest fish and chip shop to each swimming pool. However, correlated subqueries can only return a single value, and I want to retrieve both the name and the distance to the closest fish bar, so I've used a CROSS APPLY instead.
  • In the function being applied, I’m sorting the table of fish and chip shops by ascending order of their distance from the current swimming pool (ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC), and then selecting the TOP 1 Name and Location (i.e. the closest).
  • To make the query efficient, I’ve also added an index hint to use the spatial index on the Fish and Chips table – WITH(index(sidx_FishAndChips)), and I’ve included an extra predicate, WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL. This extra predicate will help the query optimiser choose the dedicated nearest neighbour query plan introduced in SQL Server 2012 and SQL Azure.
  • Note that the spatial index-optimised nearest neighbour query plan is only available in SQL Server 2012/SQL Azure. If you try to execute the UPDATE query as written above in SQL Server 2008/R2 then you’ll get an error:

    The query processor could not produce a query plan for a query with a spatial index hint.  Reason: Spatial indexes do not support the comparator supplied in the predicate.  Try removing the index hints or removing SET FORCEPLAN.

    As stated, to resolve this you’ll have to remove the WITH(index(sidx_FishAndChips)) index hint (and also expect your query to run a lot slower!). Instead, you might need to try one of the alternative approaches to efficiently finding nearest neighbours in SQL Server 2008/R2.

  • Finally, in order to perform the UPDATE, I’ve wrapped the whole lot in a table alias, s, and then updated the two columns as required.

This query should take a few seconds to run (or, if you don’t create/use the spatial index, a few minutes), after which you can check the results as follows:

SELECT * FROM SwimmingPools;

And there you have the complete table of swimming pools, each listed with the name and distance to its closest fish and chip shop:

image 

Seeing as it’s less than 100 metres from getting out of the water to a bag of chips, I’ll think I’ll be taking my next dip at the Farnworth swimming pool….

February 20, 2012

Load Garmin POI data to SQL Server

I’m always looking out for interesting new sources of spatial data to provide examples for use in my books and presentations. Microsoft’s own MSDN documentation demonstrates SQL Server spatial methods only using abstract geometries, such as

SET @h = geometry::STGeomFromText('POINT(10 10)', 0);

However, people often tell me that they really appreciate real-world examples instead – not only are they more interesting, but it’s easier to understand a practical use for the method, and they’re also more memorable afterwards. Some examples I’ve used in the past include:

  • Use STBuffer() to calculate the free delivery area around a Pizza restaurant
  • Use STDisjoint() to ensure that a planned road development does not intersect a protected area of natural beauty
  • Use STIntersection() to identify the section of the Via Appia Roman road that passed through the malaria-infested Pontine marshes.

Anyway, I thought I’d share a new source of data I came across recently – it’s a free dataset of (UK) points of interest designed for importing into Garmin sat-nav systems. As such, it contains practical POIs, categorised into things like museums, mountains, petrol stations, and restaurants. However, since it’s distributed in a simple CSV format, it’s super easy to import the data into SQL Server or other systems as well. Here’s how:

1. Download

Download the dataset of POIs from the garminpoi.co.uk website at
http://garminpoi.co.uk/?page=download

When unzipped, you’ll find it contains a number of categorised CSV files (each with an accompanying BMP icon, which you could use if you want to display this data using pushpins on an SSRS report or Bing Maps, for example):

image

2. Create a Format File

All of the CSV files follow exactly the same structure – with columns for Longitude, Latitude, a Name, and Description. This can be described by the following bulk format file, which you should save as garminpoi.xml:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
 <FIELD ID="0" xsi:type="CharTerm" TERMINATOR=","/>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=","/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
 <COLUMN SOURCE="0" NAME="Longitude" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="1" NAME="Latitude" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="Name" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="3" NAME="Details" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>

3. Open the CSV file in SQL Server

Now, you can access the data from the CSV file directly in SQL Server using OPENROWSET, supplying the name of the CSV file you want to load following the BULK keyword, and the formatfile created above following the FORMATFILE keyword. The CSV files contain column headers, which you can skip by specifying FIRSTROW=2. What’s more, you can pass the Latitude and Longitude coordinate values from the CSV straight to the Point() method of the geography datatype to create point instance at each location on-the-fly.

For example, try executing the following code listing:

SELECT *,
geography::Point(Latitude,Longitude,4326) AS TescoStore
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Tesco.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS GarminPOI;

Result? An instant map of all Tesco branches in the UK!

image

Follow

Get every new post delivered to your Inbox.

Join 53 other followers