Posts tagged ‘SQL Azure’

February 22, 2012

Updating a SQL Server Table with Nearest Neighbours (Optimised for SQL Server 2012 / SQL Azure)

So I’ve been trying to think of some examples that make use of the Garmin POI data from my last post. Seeing as POI data naturally lends itself to nearest-neighbour queries, I thought I’d write a little about those.

The general form of the nearest neighbor query is: “Show me the closest x restaurants/pubs/train stations to this location”; this is the query pattern used by location-aware applications to find POIs close to the user’s current location, for example. However, another pattern of which I’ve seen much fewer examples is how to update an entire table of records in order to determine and populate the nearest neighbour for each row in the table (indeed, this very question came up on the MSDN spatial forum just last week).

With this in mind, I thought I’d demonstrate a practical scenario: let’s say you’re planning going for a swim at the local pool and, after you’ve swum 50 lengths, you’ll probably feel a bit peckish – perhaps in the need for some fish ‘n’ chips. So, in this example I’m firstly going to import the Garmin POI data for swimming pools and fish ‘n’ chip shops in the UK, each being represented by a geography Point instance. I’ll then add an extra column to the swimming pool table and populate it with the closest fish and chip shop to every swimming pool.

To start with, create tables containing the data for fish and chip shops and swimming pools (download the data from here). Notice that I’ve included an IDENTITY primary key field in each table, which I’ll need in order to add a spatial index later:

SELECT
  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO SwimmingPools
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Swimming Pools.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS SwimmingPools;
GO

ALTER TABLE SwimmingPools ADD PRIMARY KEY(Id);
GO

SELECT
  Id = IDENTITY(int,1,1),
  CAST(Name AS varchar(255)) AS Name,
  CAST(Details AS varchar(255)) AS Details,
  geography::Point(Latitude,Longitude,4326) AS Location
INTO FishAndChips
FROM OPENROWSET(
 BULK 'C:\Users\Alastair\Downloads\GarminPOI\Fish and Chips.csv',
 FORMATFILE='C:\temp\garminpoi.xml',
 FIRSTROW=2
) AS FishAndChips;
GO

ALTER TABLE FishAndChips ADD PRIMARY KEY(Id);
GO

Now we’ll add some extra columns to the Swimming Pools table, which we’ll populate with information about the closest fish ‘n’ chip shop:

ALTER TABLE SwimmingPools
ADD ClosestFishbar varchar(255),
    Distance decimal(18,2);

Now, you don’t need to add a spatial index to perform the update query, but it’ll be a lot faster with one than without. SQL Server can only make use of one spatial index in a join between tables, so there’s no need to add an index to both tables. We’ll just add one to the Fish And Chips table, as follows:

CREATE SPATIAL INDEX sidx_FishAndChips ON FishAndChips(Location)
USING  GEOGRAPHY_GRID 
WITH (
  GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH), 
  CELLS_PER_OBJECT = 16
);

Then, run an UPDATE query to populate the ClosestFishBar and Distance columns, as follows:

UPDATE s
SET ClosestFishbar = fName,
    Distance = Location.STDistance(fLocation)  
FROM (
  SELECT
    SwimmingPools.*,
    fnc.Name AS fName,
    fnc.Location AS fLocation
  FROM SwimmingPools 
  CROSS APPLY (
    SELECT TOP 1
     Name,
     Location
    FROM FishAndChips WITH(index(sidx_FishAndChips))
    WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL
    ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC
  ) fnc
) s;

This query probably needs some explanation:

  • Generally, you might expect to use a correlated subquery to obtain the closest fish and chip shop to each swimming pool. However, correlated subqueries can only return a single value, and I want to retrieve both the name and the distance to the closest fish bar, so I've used a CROSS APPLY instead.
  • In the function being applied, I’m sorting the table of fish and chip shops by ascending order of their distance from the current swimming pool (ORDER BY FishAndChips.Location.STDistance(SwimmingPools.Location) ASC), and then selecting the TOP 1 Name and Location (i.e. the closest).
  • To make the query efficient, I’ve also added an index hint to use the spatial index on the Fish and Chips table – WITH(index(sidx_FishAndChips)), and I’ve included an extra predicate, WHERE FishAndChips.Location.STDistance(SwimmingPools.Location) IS NOT NULL. This extra predicate will help the query optimiser choose the dedicated nearest neighbour query plan introduced in SQL Server 2012 and SQL Azure.
  • Note that the spatial index-optimised nearest neighbour query plan is only available in SQL Server 2012/SQL Azure. If you try to execute the UPDATE query as written above in SQL Server 2008/R2 then you’ll get an error:

    The query processor could not produce a query plan for a query with a spatial index hint.  Reason: Spatial indexes do not support the comparator supplied in the predicate.  Try removing the index hints or removing SET FORCEPLAN.

    As stated, to resolve this you’ll have to remove the WITH(index(sidx_FishAndChips)) index hint (and also expect your query to run a lot slower!). Instead, you might need to try one of the alternative approaches to efficiently finding nearest neighbours in SQL Server 2008/R2.

  • Finally, in order to perform the UPDATE, I’ve wrapped the whole lot in a table alias, s, and then updated the two columns as required.

This query should take a few seconds to run (or, if you don’t create/use the spatial index, a few minutes), after which you can check the results as follows:

SELECT * FROM SwimmingPools;

And there you have the complete table of swimming pools, each listed with the name and distance to its closest fish and chip shop:

image 

Seeing as it’s less than 100 metres from getting out of the water to a bag of chips, I’ll think I’ll be taking my next dip at the Farnworth swimming pool….

August 31, 2011

SQL Azure – Keeping up with a Moving Target

Moving your data to the (public) cloud necessarily involves relinquishing some control over the setup and maintenance of the environment in which your data is hosted. Cloud-based hosting services such as Microsoft Azure are effectively just scalable shared hosting providers. Since parts of the server configuration are shared with other customers and (to make the service scalable) there is to be a standard template on which all instances are based, there are many system settings that your cloud provider won’t allow you to change on an individual basis.

For me, this is generally great. I’m not a DBA or SysAdmin and I have no interest in maintaining an OS, tweaking server configuration settings, installing updates, or patching hotfixes. The thought of delegating the tasks to ensure my server remains finely-oiled and up-to-date to Microsoft is very appealing.

However, this also has its own down-sides. One advantage of maintaining my own server is that, even though it might not be up-to-date or have the latest service packs applied, I know nobody else has tweaked it either. That means that, unless I’ve accidentally cocked something else up or sneezed on the delete key or something, a database-driven application that connects to my own hosted database should stay working day after day. When an upgrade is available I can choose when to apply it, and test to ensure that my applications work correctly following the upgrade according to my own plan.

Not so with SQL Azure.

Two examples of breaking changes I’ve recently experienced with SQL Azure, both seemingly as a result of changes rolled out since the July Service Release:

Firstly, if you use SQL Server Management Studio to connect and manage your SQL Azure databases, you need to upgrade SSMS to at least version 10.50.1777.0 in order to connect to an upgraded SQL Azure datacentre. This same change also broke any applications that rely on SQL Server Management Objects (including, for example, the SQL Azure Migration Wizard, resulting in the error described here). The solution to both these issues is thankfully relatively simple once diagnosed – run Windows Update and install the optional SQL Server 2008 SP1 service pack.

A more subtle change is that the behaviour of the actual SQL Azure database engine has changed, making it more comparable to Denali on-site SQL Server rather than SQL Server 2008 R2. Whereas, normally, upgrading SQL Server wouldn’t be a breaking change for most code (unless, of course, you were relying on a deprecated feature that was removed), the increase in spatial precision from 27bits to 48bits in SQL Denali means that you actually get different results from the same spatial query. Consider the following simple query:

DECLARE @line1 geometry = 'LINESTRING(0 11, 430 310)';
DECLARE @line2 geometry = 'LINESTRING(0 500, 650 0)';

SELECT @line1.STIntersection(@line2).ToString();

Previously, if you’d have run this query in SQL Azure you’d have got the same result as in SQL Server 2008/R2, which is POINT (333.88420666910952 243.16599486991572).

But then, overnight, SQL Azure is upgraded and running the same query now gives you this instead: POINT (333.88420666911088 243.16599486991646), which is consistent with the result from SQL Denali CTP3.

Not much of a difference, you might think… but think about what this means for any spatial queries that rely on exact comparison between points. How about this example using the same two geometry instances:

SELECT @line1.STIntersection(@line2).STIntersects(@line1);

SQL Azure query run in July 2011: 0. Same SQL Azure query run in August 2011: 1. Considering STIntersects() returns a Boolean, you can’t really get much more different than 1 and 0….

So, a precautionary tale: although SQL Azure hosting might have handed over the responsibility for actually performing any DB upgrades to Microsoft, the task of testing and ensuring that your code is up-to-date and doesn’t break from version to version is perhaps greater than ever, since there is no way to roll back or delay the upgrade to your little slice of the cloud.

June 1, 2011

Upcoming Spatial Precision Improvement in SQL Azure

In a previous post, I discussed a little about the approach that SQL Server 2008 uses when performing operations using the geometry and geography spatial datatypes, and the precision implications of performing calculations on a fixed size integer grid. Although not mentioned explicitly in that post, the points covered apply equally to SQL Server 2008/R2 and to the current spatial functionality in SQL Azure.

The integer grid, which is limited to 27bit resolution in SQL Server 2008/R2/Azure, is increased to 48bit resolution in SQL Server Denali. As described in the Microsoft White Paper, “New Spatial Features in SQL Server Code-Named “Denali” Community Technology Preview 1”, this increased precision can assist in ensuring that the accuracy of input coordinates is preserved throughout spatial operations. For example, consider the following coordinate, which was processed using the STUnion() method in SQL Server 2008 but which was not involved in the resulting geometry.

(82.339026 29.661245) –> (82.339025999885052 29.662144999951124)

Here is the result of the same STUnion() method in SQL Server Denali:

(82.339026 29.661245) –>  (82.339026 29.662145)

Certainly, the increased resolution resolves many of the issues I pointed out in my previous post. You can try the new increased precision in the current Denali CTP1, available for download from here. However, there is still no official announcement as to when we will expect to see Denali released for RTM.

But what is interesting is that, in the “What’s new in SQL Azure” MSDN page for May 2011, there is now a note that reads as follows:

Upcoming Increased Precision of Spatial Types: For the next major service release, some intrinsic functions will change and SQL Azure will support increased precision of Spatial Types. This will have an impact on persisted computed columns as well as any index or constraint defined in terms of the persisted computed column. With this service release SQL Azure provides a view to help determine objects that will be impacted by the change. Query sys.dm_db_objects_impacted_on_version_change (SQL Azure Database) in each database to determine impacted objects for that database.

 

The date of the “next major service release” for SQL Azure, just like the date for the release of SQL Denali, has not been announced. But this is certainly indication that the spatial functionality is signed off and, interestingly, we might see the benefits of increased spatial precision appearing in SQL Azure before SQL Denali (either that, or Microsoft are planning a simultaneous release for cloud and non-cloud, and SQL Denali RTM date might be closer then we think….)

Follow

Get every new post delivered to your Inbox.

Join 53 other followers