Loading Open Street Map Data in SQL Server Part II: Ways.

In a previous post, I described how to load POI data from Open Street Maps into SQL Server. POIs are, by their very nature, pretty easy to define and store, since they are individual points of interest. In this post I’ll look at the slightly more complex issue of loading OSM ways – series of nodes that form lines (e.g. roads, rivers, county boundaries) or areas (buildings, lakes, administrative regions etc.).

Again, I’ll prove that this can be done using pure T-SQL, but I wouldn’t necessarily recommend this approach in a production environment when there are better tools for the job Smile

Acquire OSM Data

Acquiring the OSM data follows exactly the same process as I described in my previous post:

Shred the OSM Data in SQL Server

Shredding the data relating to OSM ways is a little bit more involved that for POIs, since each way contains references to a number of different nodes.

Load the XML File

As before, start by loading the XML file into a single SQL Server variable of the XML datatype using OPENROWSET(BULK) as a SINGLE_BLOB

DECLARE @x xml;
SET @x = (SELECT * FROM OPENROWSET(
BULK 'C:\norwich_map.osm',
SINGLE_BLOB) AS x);

 Nodes

Before loading the Ways, it’s first necessary to extract the nodes from which each way is formed:

CREATE TABLE nodes (
nodeid int,
latitude float,
longitude float,
geog4326 geography
);
INSERT INTO nodes
SELECT
OSMnode.value('@id', 'int') AS nodeid,
OSMnode.value('@lat', 'float') AS latitude,
OSMnode.value('@lon', 'float') AS longitude,
geography::Point(OSMnode.value('@lat', 'float'), OSMnode.value('@lon', 'float'), 4326) AS geog4326
FROM
@x.nodes('/osm/node') AS OSM(OSMnode);

(14138 row(s) affected)

Ways

Ways do not necessarily represent routes, or even roads. Ways contain collections of nodes – they might mark out areas, footpaths, boundaries of fields, counties, or buildings, for example. To start with, I’ll just populate a table containing the IDs of all the ways included in the dataset:

CREATE TABLE ways (
wayid int
);
INSERT INTO ways
SELECT
OSMWay.e.value('(@id)[1]', 'int') AS 'WayID'
FROM
@x.nodes('/osm/way') AS OSMWay(e)

(2908 row(s) affected)

WayTags

Like Nodes, Ways may contain a number of key/value pair tag elements describing properties of ways. We’ll extract these into a separate waytags table to enable us to filter ways by category.

CREATE TABLE waytags (
wayid int,
tagname varchar(32),
tagvalue varchar(32)
);
INSERT INTO waytags
SELECT
OSMWay.e.value('(@id)[1]', 'int') AS 'WayID',
OSMWayTag.e.value('@k', 'nvarchar(32)') AS 'TagName',
OSMWayTag.e.value('@v', 'nvarchar(32)') AS 'TagValue'
FROM
@x.nodes('/osm/way') AS OSMWay(e)
CROSS APPLY
OSMWay.e.nodes('tag') AS OSMWayTag(e)

(9701 row(s) affected)

The tags assigned to a way describe properties such as what sort of way is it (a building, highway, administrative boundary, river…) and then a number of tags specific to that type of way. For highways, for example, tags might be used to describe whether it is publicly accessible, the maximum speed limit, or the type of road surface. The following query selects the ids and names of only those ways that have been tagged with one of the specified types of “highway” tags:

SELECT
w.wayid,
wtn.TagValue AS wayname,
wt.TagValue AS highwaytype
FROM
ways w
INNER JOIN waytags wt ON w.wayid = wt.wayid AND wt.TagName = 'Highway'
LEFT JOIN waytags wtn ON w.wayid = wtn.wayid AND wtn.TagName = 'Name'
WHERE
wt.TagValue IN ('motorway', 'motorway_Link', 'trunk', 'trunk_Link', 'primary', 'primary_Link', 'secondary', 'tertiary', 'residential')

(note the LEFT join to the tags table to retrieve the name of the road where known – not every way has a name)

WayNodes

The waynodes table will contain data on all of those nodeIDs that belong to a given wayID. I’ve added an identity column, orderid, in the table definition here – that’s because I need to retain the order of the nodes that make up each way as they are listed in the original OSM XML file. (Normally, you could do this with the xquery position(), but this is not supported by sql server – see https://connect.microsoft.com/SQLServer/feedback/details/383888/fully-support-position-in-xquery )

CREATE TABLE waynodes (
orderid int identity(1,1),
wayid int,
nodeid int
);
INSERT INTO waynodes (wayid, nodeid)
SELECT
OSMWay.e.value('(@id)[1]', 'int') AS 'WayID',
OSMWayNode.e.value('(@ref)[1]', 'int') AS 'NodeID'
FROM
@x.nodes('/osm/way') AS OSMWay(e)
CROSS APPLY
OSMWay.e.nodes('nd') AS OSMWayNode(e)

(17606 row(s) affected)

Construct geography Linestrings from the WayNodes

Since we can now identify:

  • all those ways that are roads (by joining to the waytags table on wayid and filtering for selecting name/value tags)
  • ….and we can also identify the nodes from which each way is formed (by joining from the ways table to the waynodes table on wayid and selecting nodes ordered by ascending orderid)
  • …and we can retrieve the coordinates of those nodes (by joining to the nodes table on nodeid and retrieving that latitude and longitude coordinates)

we now can create geography LineStrings representing each road in our dataset.

Since OSM coordinates are measured using the WGS84 spatial reference system, first add a column of the geography datatype to the ways table:

ALTER TABLE ways
ADD geog4326 geography;

Next populate the geog4326 column with LineStrings. To do so, we’ll use the FOR XML PATH(‘’) technique of concatenating the coordinate string values of each waynode together, removing the extra comma, then manually adding the LINESTRING() around the outside to create a WKT string. It’s important to order the coordinates by the orderid column in the waynodes table to make sure that the nodes of each way are added in the correct order. Finally, parsing this value as WKT to update the geog4326 column. (Yes, this is ugly, but it works):

UPDATE ways
SET geog4326 = 'LINESTRING(' + STUFF((
SELECT ',' + CAST(CAST(n.Longitude AS decimal(18,9)) AS varchar(32)) + ' ' + CAST(CAST(n.Latitude AS decimal(18,9)) AS varchar(32)) AS [text()]
FROM
ways w JOIN waynodes wn ON w.wayid = wn.wayid
JOIN nodes n ON wn.nodeid = n.nodeid
WHERE wn.wayid = ways.wayid
ORDER BY w.wayid, orderid
FOR XML PATH(''), TYPE
).value('/', 'NVARCHAR(MAX)'),1,1,'') +')'

(2908 row(s) affected)

Select the Data

If all goes well, you should now be able to run the following query to show all the geography LineStrings created by connecting their respective nodes

SELECT
wayid,
geog4326
FROM ways

image

And here, for reference, is the OSM Mapnik view of the same area – hopefully you can see the similarity!:

image

This entry was posted in Open Street Maps, Spatial, SQL Server and tagged , , , . Bookmark the permalink.

16 Responses to Loading Open Street Map Data in SQL Server Part II: Ways.

  1. Alejandro Vidal says:

    Hello!
    It’s excellent how has been documented this work!!

    I will hope can help me with a feature.

    When run the last part to update ways appears this message:

    “Msg 6522, Level 16, State 1, Line 1
    A .NET Framework error occurred during execution of user-defined routine or aggregate “geography”:
    System.ArgumentException: 24200: The specified input does not represent a valid geography instance.
    System.ArgumentException:
    en Microsoft.SqlServer.Types.SqlGeography.ConstructGeographyFromUserInput(GeoData g, Int32 srid)
    en Microsoft.SqlServer.Types.SqlGeography.GeographyFromText(OpenGisType type, SqlChars taggedText, Int32 srid)
    en Microsoft.SqlServer.Types.SqlGeography.STGeomFromText(SqlChars geometryTaggedText, Int32 srid)
    en Microsoft.SqlServer.Types.SqlGeography.Parse(SqlString s) ”

    I don’t know how solve it, its the first time i see the command STUFF and i feel there is the point.

    Thanks for your entry, your blog and by response!

    • Óscar says:

      I get another error, similar yours, and I solve it change this ‘WHERE wn.wayid = ways.wayid’ in latest script with this one ‘ WHERE wn.wayid = w.wayid’. By the way, I’m using SQL Server 2014 Express edition.

  2. Azhar says:

    HI,
    thank you for showing us new possibilities…
    when i try to update ways (i am using exactly your code without change except osm file is slightly bigger, 10mb) I get following exception from sql server.

    Msg 6522, Level 16, State 1, Line 1
    A .NET Framework error occurred during execution of user-defined routine or aggregate “geography”:
    System.ArgumentException: 24200: The specified input does not represent a valid geography instance.
    System.ArgumentException:
    at Microsoft.SqlServer.Types.SqlGeography.ConstructGeographyFromUserInput(GeoData g, Int32 srid)
    at Microsoft.SqlServer.Types.SqlGeography.GeographyFromText(OpenGisType type, SqlChars taggedText, Int32 srid)
    at Microsoft.SqlServer.Types.SqlGeography.STGeomFromText(SqlChars geometryTaggedText, Int32 srid)
    at Microsoft.SqlServer.Types.SqlGeography.Parse(SqlString s)

  3. Azhar says:

    HI,
    After observing some data i figured out that some ways are not valid to sql server 2008. so i made a poor man solution and it worked well. I will appreciate if somebody transform my solution into some better chunk of code. here is my pathetic looking code
    DECLARE @count int
    DECLARE @wayid int
    DECLARE @g geography
    DECLARE @gm GEOMETRY
    DECLARE @c varchar(max)
    SELECT @count=count(*) FROM ways WHERE geog4326 IS NULL
    SELECT TOP 1 @wayid = wayid from ways where geog4326 is null
    SELECT @count, @wayid

    WHILE (@count > 0)
    BEGIN
    SELECT TOP 1 @wayid = wayid from ways where geog4326 is null

    SET @c = ‘LINESTRING(‘ + STUFF((
    SELECT ‘,’ + CAST(CAST(n.Longitude AS decimal(18,9)) AS varchar(32)) + ‘ ‘ + CAST(CAST(n.Latitude AS decimal(18,9)) AS varchar(32)) AS [text()]
    FROM
    ways w JOIN waynodes wn ON w.wayid = wn.wayid
    JOIN nodes n ON wn.nodeid = n.nodeid
    WHERE wn.wayid = @wayid
    ORDER BY w.wayid, orderid
    FOR XML PATH(”), TYPE
    ).value(‘/’, ‘NVARCHAR(MAX)’),1,1,”) +’)’
    SET @gm = GEOMETRY::STLineFromText(@c, 4326);
    IF @gm.STIsValid() = 0
    BEGIN
    SET @gm = @gm.MakeValid()
    END
    SELECT @g = geography::STGeomFromText(@gm.STAsText(), 4326);
    UPDATE ways
    SET geog4326 = @g
    WHERE wayid = @wayid

    SELECT @count = @count – 1
    END

    keep in mind i am just learner….
    Thanks

  4. bram says:

    Hi,
    Is there another way to construct the linestrings from the waynodes fast. When I run alastair’s code (which is very fast in my opinion) I get problems with memory (I’m trying to make all the roads for Europe) and when I run Azhar’s code it’s taking too long. I tried this first for just Holland (500mb) which took around 5 hours with Azhar’s code so I’m afraid it will take >60 days for Europe.

    Is there a function (like ST_LinestringAggr for sybase) that makes linestrings from geometry points instead of stuffing the lat and long in a string?

    Thanks in advance!!!

    Bram

    • alastaira says:

      Hi Bram,
      There’s no ST_LineFromPoints in SQL Server, but you can certainly create your own SQLCLR function that takes two points and connects them to make the corresponding LineString. It would look a little like this (not tested):

      public static SqlGeometry LineFromPoints(
      SqlGeometry @p1,
      SqlGeometry @p2)
      {
      SqlGeometryBuilder gb = new SqlGeometryBuilder();
      gb.SetSrid((int)(@p1.STSrid));
      gb.BeginGeometry(OpenGisGeometryType.LineString);
      gb.BeginFigure((double)@p1.STX, (double)@p1.STY);
      gb.AddLine((double)@p2.STX, (double)@p2.STY);
      gb.EndFigure();
      gb.EndGeometry();
      return gb.ConstructedGeometry;
      }

      • bram says:

        Thanks. Saves me some time searching. The code i posted in the next comment already did the job, so that problem seems solved. Now I have a new problem:

        An XML operation resulted an XML data type exceeding 2GB in size. Operation aborted. When I run:

        DECLARE @x xml;
        SET @x = (SELECT * FROM OPENROWSET(
        BULK ‘C:\norwich_map.osm’,
        SINGLE_BLOB) AS x);

        The file I’m trying to load is 140GB. Have you ever tackled this problem?

        Thanks,

        Bram

    • alastaira says:

      To be honest, if I was trying to load anything like that amount of data I wouldn’t be doing it in T-SQL – I’d use a tool like Safe FME. If you don’t want to splash out any £££, it probably wouldn’t be too hard to write an OSM-parser in CLR, but the approach described here is really not intended for use on that kind of data.

      • bram says:

        Thanks! Probably will go for second option (OSM parser in CLR)

      • bram says:

        Hi Alastaira,

        I solved it (it’s still importing so you never know) by importing the xml file with SSIS. With XML data source.
        It performed very well with test file and is almost as fast as the code you described above in your article

  5. bram says:

    The following code looks promising (did not test it yet on holland). Before it works however you need to update the waynodes table with an extra column that hold its point

    I used the following code:

    use geospatial_dev
    ALTER TABLE waynodes
    ADD geom4326waynodes geometry;

    UPDATE waynodes
    SET geom4326waynodes = geom4326 FROM nodes JOIN waynodes ON waynodes.nodeid = nodes.nodeid

    Then execute this code:

    use geospatial_dev

    DECLARE @wayid int
    DECLARE @wayidold int
    SELECT TOP 1 @wayidold = wayid from ways
    DECLARE @Point geometry
    DECLARE @CoordinateStream varbinary(max)
    DECLARE @i int
    SET @i = 0
    DECLARE GeomCursor CURSOR FOR SELECT wayid, geom4326waynodes FROM waynodes WITH (NOLOCK);
    OPEN GeomCursor;

    FETCH NEXT FROM GeomCursor INTO @wayid, @Point;
    WHILE @@FETCH_STATUS = 0
    BEGIN
    IF (@wayid = @wayidold)
    BEGIN
    IF (@i=0) BEGIN
    SET @CoordinateStream = SUBSTRING(@Point.STAsBinary(),6,16)
    END;
    ELSE BEGIN
    SET @CoordinateStream = @CoordinateStream + SUBSTRING(@Point.STAsBinary(),6,16)
    END;
    SET @i = @i+1
    END;
    ELSE
    BEGIN
    UPDATE ways
    SET geom4326 = geometry::STGeomFromWKB(0x01 + 0x02000000 + CONVERT(varbinary,REVERSE(CONVERT(varbinary,@i))) + @CoordinateStream,0).STAsText()

    WHERE wayid = @wayid
    SET @i = 0
    IF (@i=0) BEGIN
    SET @CoordinateStream = SUBSTRING(@Point.STAsBinary(),6,16)
    END;
    ELSE BEGIN
    SET @CoordinateStream = @CoordinateStream + SUBSTRING(@Point.STAsBinary(),6,16)
    END;
    SET @i = @i+1
    END;
    SET @wayidold = @wayid
    FETCH NEXT FROM GeomCursor INTO @wayid, @Point;
    END;
    CLOSE GeomCursor;
    DEALLOCATE GeomCursor;

  6. sachin says:

    Genious work, a reliable method of reading data from openstreepmap

  7. Bogdan Nicovski says:

    Great work. I have little problem, following query returns every street that i want, but i want to have city name of the street in the wayname column. I will appreciate any help?

    Thanks

    SELECT
    w.wayid,
    wtn.TagValue AS wayname,
    wt.TagValue AS highwaytype
    FROM
    ways w
    INNER JOIN waytags wt ON w.wayid = wt.wayid AND wt.TagName = ‘Highway’
    LEFT JOIN waytags wtn ON w.wayid = wtn.wayid AND wtn.TagName = ‘Name’
    WHERE
    wt.TagValue IN (‘motorway’, ‘motorway_Link’, ‘trunk’, ‘trunk_Link’, ‘primary’, ‘primary_Link’, ‘secondary’, ‘tertiary’, ‘residential’)

  8. Óscar says:

    Great work!

    I have tried with Barcelona data, and I get an Out of Memory running the last script on SQL Server 2014 Express.

    If I run this select:
    SELECT ‘,’ + CAST(CAST(n.Longitude AS decimal(18,9)) AS varchar(32)) + ‘ ‘ + CAST(CAST(n.Latitude AS decimal(18,9)) AS varchar(32)) AS [text()]
    FROM
    ways w JOIN waynodes wn ON w.wayid = wn.wayid
    JOIN nodes n ON wn.nodeid = n.nodeid
    WHERE wn.wayid = w.wayid
    ORDER BY w.wayid, orderid

    I get 1,018,575 rows.

    Any about how can I solve this problem?

  9. Simo Simo says:

    Hi , i’m in production environment , i have to inclue a whole country , so is there please a faster way to do the same ?

  10. Hi Alastair – great articles, thanks!
    Just one thing, I had to change the datatype from INT to BIGINT to get some queries working. Maybe OSM has outgrown INT?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s