In a previous post, I described how to load POI data from Open Street Maps into SQL Server. POIs are, by their very nature, pretty easy to define and store, since they are individual points of interest. In this post I’ll look at the slightly more complex issue of loading OSM ways – series of nodes that form lines (e.g. roads, rivers, county boundaries) or areas (buildings, lakes, administrative regions etc.).
Again, I’ll prove that this can be done using pure T-SQL, but I wouldn’t necessarily recommend this approach in a production environment when there are better tools for the job
Acquire OSM Data
Acquiring the OSM data follows exactly the same process as I described in my previous post:
- Go to http://www.openstreetmap.org/, pan and zoom the map to display the area of interest, and then select the “Export” tab.
- Or, download complete or partial subsets of planet.osm – the master file that contains all nodes in the Open Street Map dataset. The original file can be downloaded from http://planet.osm.org/, or you can download subsets split by individual country or feature type from http://download.geofabrik.de/osm/ and http://downloads.cloudmade.com/
Shred the OSM Data in SQL Server
Shredding the data relating to OSM ways is a little bit more involved that for POIs, since each way contains references to a number of different nodes.
Load the XML File
As before, start by loading the XML file into a single SQL Server variable of the XML datatype using OPENROWSET(BULK) as a SINGLE_BLOB
DECLARE @x xml; SET @x = (SELECT * FROM OPENROWSET( BULK 'C:\norwich_map.osm', SINGLE_BLOB) AS x);
Before loading the Ways, it’s first necessary to extract the nodes from which each way is formed:
CREATE TABLE nodes ( nodeid int, latitude float, longitude float, geog4326 geography ); INSERT INTO nodes SELECT OSMnode.value('@id', 'int') AS nodeid, OSMnode.value('@lat', 'float') AS latitude, OSMnode.value('@lon', 'float') AS longitude, geography::Point(OSMnode.value('@lat', 'float'), OSMnode.value('@lon', 'float'), 4326) AS geog4326 FROM @x.nodes('/osm/node') AS OSM(OSMnode);
(14138 row(s) affected)
Ways do not necessarily represent routes, or even roads. Ways contain collections of nodes – they might mark out areas, footpaths, boundaries of fields, counties, or buildings, for example. To start with, I’ll just populate a table containing the IDs of all the ways included in the dataset:
CREATE TABLE ways ( wayid int ); INSERT INTO ways SELECT OSMWay.e.value('(@id)', 'int') AS 'WayID' FROM @x.nodes('/osm/way') AS OSMWay(e)
(2908 row(s) affected)
Like Nodes, Ways may contain a number of key/value pair tag elements describing properties of ways. We’ll extract these into a separate waytags table to enable us to filter ways by category.
CREATE TABLE waytags ( wayid int, tagname varchar(32), tagvalue varchar(32) ); INSERT INTO waytags SELECT OSMWay.e.value('(@id)', 'int') AS 'WayID', OSMWayTag.e.value('@k', 'nvarchar(32)') AS 'TagName', OSMWayTag.e.value('@v', 'nvarchar(32)') AS 'TagValue' FROM @x.nodes('/osm/way') AS OSMWay(e) CROSS APPLY OSMWay.e.nodes('tag') AS OSMWayTag(e)
(9701 row(s) affected)
The tags assigned to a way describe properties such as what sort of way is it (a building, highway, administrative boundary, river…) and then a number of tags specific to that type of way. For highways, for example, tags might be used to describe whether it is publicly accessible, the maximum speed limit, or the type of road surface. The following query selects the ids and names of only those ways that have been tagged with one of the specified types of “highway” tags:
SELECT w.wayid, wtn.TagValue AS wayname, wt.TagValue AS highwaytype FROM ways w INNER JOIN waytags wt ON w.wayid = wt.wayid AND wt.TagName = 'Highway' LEFT JOIN waytags wtn ON w.wayid = wtn.wayid AND wtn.TagName = 'Name' WHERE wt.TagValue IN ('motorway', 'motorway_Link', 'trunk', 'trunk_Link', 'primary', 'primary_Link', 'secondary', 'tertiary', 'residential')
(note the LEFT join to the tags table to retrieve the name of the road where known – not every way has a name)
The waynodes table will contain data on all of those nodeIDs that belong to a given wayID. I’ve added an identity column, orderid, in the table definition here – that’s because I need to retain the order of the nodes that make up each way as they are listed in the original OSM XML file. (Normally, you could do this with the xquery position(), but this is not supported by sql server – see https://connect.microsoft.com/SQLServer/feedback/details/383888/fully-support-position-in-xquery )
CREATE TABLE waynodes ( orderid int identity(1,1), wayid int, nodeid int ); INSERT INTO waynodes (wayid, nodeid) SELECT OSMWay.e.value('(@id)', 'int') AS 'WayID', OSMWayNode.e.value('(@ref)', 'int') AS 'NodeID' FROM @x.nodes('/osm/way') AS OSMWay(e) CROSS APPLY OSMWay.e.nodes('nd') AS OSMWayNode(e)
(17606 row(s) affected)
Construct geography Linestrings from the WayNodes
Since we can now identify:
- all those ways that are roads (by joining to the waytags table on wayid and filtering for selecting name/value tags)
- ….and we can also identify the nodes from which each way is formed (by joining from the ways table to the waynodes table on wayid and selecting nodes ordered by ascending orderid)
- …and we can retrieve the coordinates of those nodes (by joining to the nodes table on nodeid and retrieving that latitude and longitude coordinates)
we now can create geography LineStrings representing each road in our dataset.
Since OSM coordinates are measured using the WGS84 spatial reference system, first add a column of the geography datatype to the ways table:
ALTER TABLE ways ADD geog4326 geography;
Next populate the geog4326 column with LineStrings. To do so, we’ll use the FOR XML PATH(‘’) technique of concatenating the coordinate string values of each waynode together, removing the extra comma, then manually adding the LINESTRING() around the outside to create a WKT string. It’s important to order the coordinates by the orderid column in the waynodes table to make sure that the nodes of each way are added in the correct order. Finally, parsing this value as WKT to update the geog4326 column. (Yes, this is ugly, but it works):
UPDATE ways SET geog4326 = 'LINESTRING(' + STUFF(( SELECT ',' + CAST(CAST(n.Longitude AS decimal(18,9)) AS varchar(32)) + ' ' + CAST(CAST(n.Latitude AS decimal(18,9)) AS varchar(32)) AS [text()] FROM ways w JOIN waynodes wn ON w.wayid = wn.wayid JOIN nodes n ON wn.nodeid = n.nodeid WHERE wn.wayid = ways.wayid ORDER BY w.wayid, orderid FOR XML PATH(''), TYPE ).value('/', 'NVARCHAR(MAX)'),1,1,'') +')'
(2908 row(s) affected)
Select the Data
If all goes well, you should now be able to run the following query to show all the geography LineStrings created by connecting their respective nodes
SELECT wayid, geog4326 FROM ways
And here, for reference, is the OSM Mapnik view of the same area – hopefully you can see the similarity!: