Working with spatial data involves many dark arts, and performance tuning spatial indexes is perhaps the blackest of all of them. Although there are only a few parameters to configure: GRIDS, CELLS_PER_OBJECT, and BOUNDING_BOX, the number of different index combinations possible and the way in which these settings interact with each other is pretty complex. This is particularly true when dealing with geography data, since even the fundamental concept of defining a “grid” becomes quite tricky when dealing with an ellipsoidal surface.
Fortunately, there are two new system stored procedures in SQL Server Denali that make the process of spatial index-tuning much simpler, by allowing you to recreate (and therefore visualise) polygons representing the decomposition of a set of data under different index settings. These are sp_help_spatial_geography_histogram, and its geometry sister, sp_help_spatial_geometry_histogram. If you Google either of these terms right now, you’ll get only 7 results (Hopefully after I publish this post there might be 8 results…). I find this pretty surprising because, to me, these procedures (particularly the geography flavour) are pretty awesome.
Why? Well, when creating spatial indexes you generally want each feature to be described using the greatest possible accuracy (in order to increase the primary filter efficiency), but requiring the least number of cells to do so (in order to ensure that features are fully tessellated without blowing the CELLS_PER_OBJECT limit, or the index becoming large and unwieldy). But, when you create a index on a spatial column, how do you know in which grid cell(s) a given feature lies?
For the geometry datatype, this is pretty easy. LOW resolution is a 4×4 grid, MEDIUM resolution corresponds to an 8×8 grid, and HIGH resolution is a 16×16 grid. There are four levels of nested grids. So, to work out the size of each cell at Level 1 in a MEDIUM geometry index, you take the extent of the index bounding box in both x and y dimensions, and divide by 8. If the next grid resolution is also MEDIUM, then each cell at Level 2 will be 1/8th of a Level 1 cell, i.e. 1/64th of the height and width of the bounding box of the index. etc. etc.
So, if you have a bounding box that covers a map of the whole globe, from (–180 –90) to (180 90), then the level 1 cells in a HIGH resolution grid look like this:
Since the geometry datatype operates on a simple, planar surface, it’s actually always been pretty easy to work out this grid of cells programmatically, and determine which (and, crucially for performance-tuning, how many) cells a given feature intersects. But, the sp_help_spatial_geometry_histogram makes this even easier. You can now create the above grid for yourself using the following T-SQL:
CREATE TABLE #geometry_histogram_test ( geom geometry ); INSERT INTO #geometry_histogram_test VALUES ( geometry::STPolyFromText('POLYGON((-180 -90, 180 -90, 180 90, -180 90, -180 -90))', 4326) ); DECLARE @histogram_results table ( CellID int, Cell geometry, IntersectionCount int ); INSERT INTO @histogram_results EXEC sp_help_spatial_geometry_histogram @tabname=#geometry_histogram_test, @colname=geom, @xmin=-180, @xmax=180, @ymin=-90, @ymax=90, @resolution=16; SELECT CELL.STBoundary().STBuffer(0.05) FROM @histogram_results;
Tweak the @resolution value to get the grids at a given level and resolution. For example, to determine the set of Level 1 cells at HIGH resolution, supply a @resolution parameter of 16. To determine the set of Level 2 cells in a MEDIUM, MEDIUM grid, the @resolution parameter should be (8*8) = 64. But what’s more – the output of the procedure won’t just create the grid for you, but will give you a count of each feature from the underlying table that intersects each feature. For this example, I just created a single polygon spanning the full extent of the bounding box so the intersection count of every cell is just 1, but for distribution analysis of a real dataset, this is very useful.
But what’s even more useful is the sp_help_spatial_geography_histogram counterpart, because, despite the explanation in the MSDN spatial indexing overview, it’s always been hard to visualise exactly how an index grid is applied to an ellipsoidal surface. Now, using code very similar to the above, you can see exactly how the geography index cells are created (and, again, determine how many features intersect each cell). For example, the following code listing demonstrates the set of Level 1 HIGH geography index cells:
CREATE TABLE #histogram_test ( geog geography ); INSERT INTO #histogram_test VALUES ( geography::STGeomFromText('FULLGLOBE', 4326) ); DECLARE @histogram_results table ( CellID int, Cell geography, IntersectionCount int ); INSERT INTO @histogram_results EXEC sp_help_spatial_geography_histogram @tabname=#histogram_test, @colname=geog, @resolution=16; SELECT CELL.RingN(1).STBuffer(10000) FROM @histogram_results;
I’ve run the procedure against a table containing a single FULLGLOBE geometry to ensure that every cell intersects some feature (cells that don’t intersect anything in the base table aren’t included in the output). Since geography indexes have no bounding box and are implicitly assumed to tessellate the whole globe, these Level 1 cells will be the same for every HIGH Level 1 geography index. And here’s what they look like:
And the Level 3 cells in a LOW, LOW, LOW geography grid (at which point, the resolution is 4*4*4 = 64)? They look slightly more like this:
Pretty, huh? But more importantly, these procedures make visualising, analysing, and ultimately tuning geography indexes much easier.