Easy Bulk Loading OS Locator Road Data into SQL Server

I’ve just loaded the Ordnance Survey “OS Locator” dataset (part of Ordnance Survey Open Data) into SQL Server 2008. OS Locator contains details of all the roads in Britain, in a gazetteer-style format. It is a 120Mb delimited text file, containing details of around 790,000 road entities (some roads are split into multiple entities if they cross districts etc.) – including the road name, the coordinates of its bounding box and centrepoint, classification, and the county and area in which it is located. (Note that this dataset doesn’t include the geometry of the shape of the road itself). You can find the technical specifications for the OS Locator dataset here: http://www.ordnancesurvey.co.uk/oswebsite/products/oslocator/docs/user_guide.pdf

Since the file is text-based, there’s a range of options available to import it – you could create an SSIS package, or use the import/export wizard, or one of a variety of third party ETL tools. However, for an absolutely dead-easy way to query the data and create a geometry point representing the centre of each road using purely T-SQL, I used the OPENROWSET bulk function in conjunction with a format file.

Not only is this method easy, but it’s utterly repeatable so (assuming the structure of the source data remains constant) you can run it again each time the underlying data gets refreshed. OPENROWSET allows you to query the text file source directly, as if it were a table in the database, from which you can SELECT columns of data, INSERT them into other tables etc.

The following shows the xml format file I used to specify the columns of data in the OS Locator file:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
 <FIELD ID="0" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="5" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="6" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="7" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="8" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="9" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="10" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="11" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="12" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="13" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="14" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="15" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
 <COLUMN SOURCE="0" NAME="Name" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="1" NAME="Classification" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="Centx" xsi:type="SQLINT"/>
 <COLUMN SOURCE="3" NAME="Centy" xsi:type="SQLINT"/>
 <COLUMN SOURCE="4" NAME="Minx" xsi:type="SQLINT"/>
 <COLUMN SOURCE="5" NAME="Maxx" xsi:type="SQLINT"/>
 <COLUMN SOURCE="6" NAME="Miny" xsi:type="SQLINT"/>
 <COLUMN SOURCE="7" NAME="Maxy" xsi:type="SQLINT"/>
 <COLUMN SOURCE="8" NAME="PostSector" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="9" NAME="Settlement" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="10" NAME="Locality" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="11" NAME="Cou_Unit" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="12" NAME="LocalAuth" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="13" NAME="Tile_10k" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="14" NAME="Tile_25k" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="15" NAME="Source" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>

Assuming that the format file above is saved as c:\OSLocator_formatfile.xml, and the OS Locator download itself is saved as c:\OS_Locator2010_2_Open.txt (the filename of the most recent locator download at the time of writing), then you can load the Locator dataset directly in SQL Server, creating a geometry point at the centre of each road in a single query as follows:

SELECT *,
geometry::Point(Centx,Centy,27700) AS Centre
FROM OPENROWSET(
 BULK 'C:\OS_Locator2010_2_Open.txt',
 FORMATFILE='C:\OSLocator_formatfile.xml'
) AS CSV;

And here’s what the data looks like:

image

Here’s the spatial results tab displaying the centre points of the first 5,000 rows of data:

image

And here’s what a small section of Norwich looks like having created a bounding box from the min/max coordinates of each road and overlaying them on Bing Maps (having first transformed the coordinates from EPSG:27700 to EPSG:4326):

image

This entry was posted in Spatial, SQL Server and tagged , , . Bookmark the permalink.

4 Responses to Easy Bulk Loading OS Locator Road Data into SQL Server

  1. Pure Krome says:

    Holy Radical Roads, Bat-Alastair-Man!

    this simple post has totally made my entire year! Thank the smart British folks for opening up this data in the last 12 odd months AND for you to having a crack at importing this🙂

    (I really really really can’t express how much this helps me and our project).

    Question, though:

    You said: “having first transformed the coordinates from EPSG:27700 to EPSG:4326”

    where did u do this? I used to do this via ogr2ogr or some program (i can’t remember exactly what it was called) .. to convert projections. Did u do that also, after the data was imported … then export .. convert .. then re-import?

    (i hope u didn’t have the luxury of using a paid commerical app, which just did it for you via connecting directly to the SqlServer DB :~( )

  2. alastaira says:

    You’ll be pleased to hear that I haven’t used any paid commercial apps at all – in fact I think all of the tools and data that I’ve used in any of my blog posts have been free – SQL Server Express, OS Open Data, OGR2OGR, Shape2SQL…
    The particular tool I used in this case for the reprojection was the Proj.NET library – an open source codeplex project that I’ve adapted to reproject geometry/geography data directly within SQL Server, but I haven’t written that blog post yet… you’re right though – you could have done it using OGR2OGR instead to convert the coordinates prior to importing them.

  3. Pure Krome says:

    Leveraging Proj.Net to reproject because you have access to the SqlDB – awesomesauce. Why didn’t I think of that …. well .. actually .. dumb question. I’m not surprised I didn’t think of that *blush*

    Now if only we can ask Morten to update Shape2Sql to include projections via the Proj.NET library *hint hint*😛

  4. Pingback: SQLBits 8 | Alastair Aitchison

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s