Posts tagged ‘Ordnance Survey’

July 1, 2011

Loading OS VectorMap data to SQL Server with Powershell and OGR2OGR

I’ve already written several posts about loading spatial data into SQL Server – using OGR2OGR, or Shape2SQL, for example. In this post, I’m going to demonstrate how you can call OGR2OGR from a PowerShell script in order to loop through and load the entire set of OS VectorMap layers into SQL Server in a few simple lines of script. Note that you’ll need the latest version of OGR2OGR (1.8) in order to play along at home, since I’ll be using the direct MSSQLSpatial driver introduced in that version.

Create the Tables

OS Vectormap data is supplied in a number of different layers, including roads, natural features, settlement areas, railway lines etc. Some of these layers represent point data, some polylines, and some polygons. (Note that there are no multi_xxx features, and a given layer contains only features of one type). Unlike some spatial formats (such as ESRI shapefiles), SQL Server lets you mix ‘n’ match different types of geometry within the same column. You could, if you wanted to, merge all the OS VectorMap features into a single table. However, since there are different additional attributes associated with each type of feature, it generally makes sense to define separate tables for each feature layer.

OGR2OGR can create tables for each feature layer in SQL Server directly, creating geometry or geography columns to hold spatial data itself and additional columns to hold any non-spatial attributes in the source data file. However, I find this to be quite unreliable; sometimes, the field lengths of the columns in the table created by OGR2OGR are set too small (e.g. numeric(5,2)), causing attribute data inserted into the table to be truncated or lead to an overflow error. Sometimes, column field lengths are set too large, which is wasteful (i.e. using nvarchar(max) to store a fixed-length two-byte string). And, sometimes, the datatype chosen for a column is just plain wrong. i.e. creating a float column to store a set of purely integer values.

So, I’d always recommend that you create the destination tables for each layer manually, specifying the correct column datatypes yourself before using OGR2OGR to load data into it. The following script can be used to create individual tables for each of the feature layers in the OS Vector Map dataset – road/railway/settlement layer/heritage/community services etc. – including all associated attribute values. Notice that I’m using the geometry datatype, and I’ve named the spatial column geom27700, indicating that all the data inserted in this column will be projected data using the EPSG:27700 spatial reference system (the National Grid of Great Britain, as used by all Ordnance Survey data):

CREATE TABLE dbo.administrativeboundary(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(255) NULL,
	featcode int NULL,
 CONSTRAINT PK_administrativeboundary PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.road_line(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
	name varchar(254) NULL,
	number varchar(30) NULL,
 CONSTRAINT PK_road_line PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.railway_line(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
 CONSTRAINT PK_railway_line PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.communityservices(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
	name varchar(254) NULL,
 CONSTRAINT PK_communityservices PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.height(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
	height varchar(254) NULL,
 CONSTRAINT PK_height PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.heritage(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
	name varchar(254) NULL,
 CONSTRAINT PK_heritage PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.naturalfeature_area(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
 CONSTRAINT PK_naturalfeature_area PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.naturalfeature_line(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
 CONSTRAINT PK_naturalfeature_line PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));
CREATE TABLE dbo.railway_point(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
	name varchar(30) NULL,
 CONSTRAINT PK_railway_point PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.settlement_line(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
 CONSTRAINT PK_settlement_line PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.settlement_area(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
 CONSTRAINT PK_settlement_area PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.text(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
	name varchar(254) NULL,
	xml_name varchar(254) NULL,
	fonttype varchar(150) NULL,
	fontcolour varchar(30) NULL,
	fontheight varchar(30) NULL,
	orientatio numeric(30, 1) NULL,
 CONSTRAINT PK_text PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

CREATE TABLE dbo.tidalboundary(
	ogr_fid int IDENTITY(1,1) NOT NULL,
	geom27700 geometry NULL,
	featdesc varchar(254) NULL,
	featcode int NULL,
 CONSTRAINT PK_tidalboundary PRIMARY KEY CLUSTERED 
(
	ogr_fid ASC
));

Loading the geometry data with Powershell

The OS VectorMap dataset is cut into grid squares 100km x 100km across, denoted by a two letter combination, e.g. TG. Each of these squares are further subdivided into 100 10km x 10km squares, denoted by a two letter and two number combination e.g. TG20. So the directory structure of the OS VectorMap data looks like this:

image

To load the complete set of data, you need to loop through all subdirectories starting from a given base directory, then call OGR2OGR to append the data in each shapefile found therein into the appropriate feature table based on its filename. Note that not every feature layer will be found in every subdirectory (you won’t, for example, find many tidal features in Derbyshire).

And here’s a powershell script to do that loading for you. Just change the name of the base directory from which to search (c:\osdata), the path to ogr2ogr.exe (c:\warmerda\bld\bin), and the connection string to your SQL Server instance as appropriate:

get-childitem c:\osdata -include *.shp -recurse | foreach ($_) {
  echo "Now loading: " $_.fullname
  C:\warmerda\bld\bin\ogr2ogr -progress -append -f "MSSQLSpatial" "MSSQL:server=zangief\SQLEXPRESS;database=OSVectorMap;trusted_connection=yes;" $_.fullname -a_srs "EPSG:27700" -lco "GEOM_TYPE=geometry" -lco "GEOM_NAME=geom27700"
}

The layer creation options (-lco) supplied to ogr2ogr specify that this is geometry data in the EPSG:27700 coordinate reference system, and should be loaded into the geom27700 of each table. The other attribute values of each feature will be copied straight into the corresponding columns. Execute the script and, if all goes to plan, you should see the following as the files start to load:

image

Note that loading the full set of OS Vectormap will take some time so, at this point, I’d go and get a coffee or browse facebook or something. Then, when you return, you should be in possession of one table containing the combined data for each OS VectorMap feature layer, a bit like this:

image

March 8, 2011

Easy Bulk Loading OS Locator Road Data into SQL Server

I’ve just loaded the Ordnance Survey “OS Locator” dataset (part of Ordnance Survey Open Data) into SQL Server 2008. OS Locator contains details of all the roads in Britain, in a gazetteer-style format. It is a 120Mb delimited text file, containing details of around 790,000 road entities (some roads are split into multiple entities if they cross districts etc.) – including the road name, the coordinates of its bounding box and centrepoint, classification, and the county and area in which it is located. (Note that this dataset doesn’t include the geometry of the shape of the road itself). You can find the technical specifications for the OS Locator dataset here: http://www.ordnancesurvey.co.uk/oswebsite/products/oslocator/docs/user_guide.pdf

Since the file is text-based, there’s a range of options available to import it – you could create an SSIS package, or use the import/export wizard, or one of a variety of third party ETL tools. However, for an absolutely dead-easy way to query the data and create a geometry point representing the centre of each road using purely T-SQL, I used the OPENROWSET bulk function in conjunction with a format file.

Not only is this method easy, but it’s utterly repeatable so (assuming the structure of the source data remains constant) you can run it again each time the underlying data gets refreshed. OPENROWSET allows you to query the text file source directly, as if it were a table in the database, from which you can SELECT columns of data, INSERT them into other tables etc.

The following shows the xml format file I used to specify the columns of data in the OS Locator file:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
 <FIELD ID="0" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="5" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="6" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="7" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="8" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="9" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="10" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="11" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="12" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="13" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="14" xsi:type="CharTerm" TERMINATOR=":"/>
 <FIELD ID="15" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
 <COLUMN SOURCE="0" NAME="Name" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="1" NAME="Classification" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="Centx" xsi:type="SQLINT"/>
 <COLUMN SOURCE="3" NAME="Centy" xsi:type="SQLINT"/>
 <COLUMN SOURCE="4" NAME="Minx" xsi:type="SQLINT"/>
 <COLUMN SOURCE="5" NAME="Maxx" xsi:type="SQLINT"/>
 <COLUMN SOURCE="6" NAME="Miny" xsi:type="SQLINT"/>
 <COLUMN SOURCE="7" NAME="Maxy" xsi:type="SQLINT"/>
 <COLUMN SOURCE="8" NAME="PostSector" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="9" NAME="Settlement" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="10" NAME="Locality" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="11" NAME="Cou_Unit" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="12" NAME="LocalAuth" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="13" NAME="Tile_10k" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="14" NAME="Tile_25k" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="15" NAME="Source" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>

Assuming that the format file above is saved as c:\OSLocator_formatfile.xml, and the OS Locator download itself is saved as c:\OS_Locator2010_2_Open.txt (the filename of the most recent locator download at the time of writing), then you can load the Locator dataset directly in SQL Server, creating a geometry point at the centre of each road in a single query as follows:

SELECT *,
geometry::Point(Centx,Centy,27700) AS Centre
FROM OPENROWSET(
 BULK 'C:\OS_Locator2010_2_Open.txt',
 FORMATFILE='C:\OSLocator_formatfile.xml'
) AS CSV;

And here’s what the data looks like:

image

Here’s the spatial results tab displaying the centre points of the first 5,000 rows of data:

image

And here’s what a small section of Norwich looks like having created a bounding box from the min/max coordinates of each road and overlaying them on Bing Maps (having first transformed the coordinates from EPSG:27700 to EPSG:4326):

image

February 21, 2011

Using OGR2OGR to Convert, Reproject, and Load Spatial Data to SQL Server

In a previous post, I used OGR2OGR to join a set of shapefiles together prior to loading them into SQL Server using Shape2SQL. But OGR2OGR can do much more than simply appending shapefiles – it can convert data into different formats, transform coordinates into different spatial references systems, read and write a whole variety of spatial datasources, and (with a bit of fiddling) load them into SQL Server 2008.

To demonstrate, I thought I’d repeat the objective of my previous post, but instead of simply appending the Ordnance Survey open data shapefiles together, I would reproject them into a different SRID, and import them into SQL Server too using OGR2OGR alone, without using Shape2SQL or any other tools.

Use OGR2OGR to create a CSV file containing WKT

OGR2OGR can’t insert spatial data directly into SQL Server 2008, but it can create CSV files that can be read into SQL Server. To create an output file in CSV format, set the –f CSV output option. You can also manually set layer creation options to dictate the field delimiter and line return character of the CSV file, using the LINEFORMAT and SEPARATOR layer creation options.

By default, the CSV file created only contains columns containing the attributes associated with each shape (i.e the data in the .dbf file associated with a shapefile), but it doesn’t include the shape itself. To include the shape of the geometry in Well-Known Text format, you also need to state the –lco “GEOMETRY=AS_WKT” layer creation option.

But that’s not all – OGR2OGR can also reproject data between coordinate systems as it converts it. To convert to another spatial reference system, include the -t_srs parameter, together with either the full WKT definition (hard to escape properly,i.e. GEOGCS(ELLIPSOID(“WGS 84”, ……) or the EPSG code (much easier, i.e. EPSG:4326 for WGS84).

Putting this all together, you can use the following command to load a shapefile (in this case, the Ordnance Survey OS Vector Map settlement_area polygon shapefile), reproject the data into SRID 4326, and then save the result as a CSV file containing a column with the WKT of each resulting polygon:

ogr2ogr -f "CSV" "Settlement_Area" "Settlement_Area.shp" -t_srs "EPSG:4326" -lco "GEOMETRY=AS_WKT" -lco "LINEFORMAT=CRLF" -lco "SEPARATOR=SEMICOLON"

NOTE – I’ve noticed some problems with the quote/speechmark characters in this blog being changed to “smartquotes”, even when I use the code-formatting option. OGR2OGR isn’t happy about this, so I recommend that you don’t try and copy and paste the line above into a CMD window – if you get an error about “Couldn’t fetch request layer uT” or something similar then be sure to retype using normal speech marks.

More information on CSV options for OGR2OGR can be found here: http://www.gdal.org/ogr/drv_csv.html

Create a Format File

To import the CSV into SQL Server, we have a range of options – you could use SSIS, or you could use the Tasks –> Import Data wizard. Or you could use BCP or BULK INSERT. However, I’m going to use OPENROWSET. This will allow me to write a single query to access the CSV file directly from T-SQL as if it were an existing table.

In order to do this, we first need to define a format file which specifies the datatypes of each of the columns in the input csv file, and how they should be mapped to datatypes of columns in a sql table. More information on format files can be found here.

The following shows the XML format file required for the OS VectorMaps Settlement Area shapefile just converted:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="0" xsi:type="CharTerm" TERMINATOR="""/>
<FIELD ID="1" xsi:type="CharTerm" MAX_LENGTH="2147483647" TERMINATOR="";"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR=";"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="WKT" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="FEATCODE" xsi:type="SQLDECIMAL"/>
<COLUMN SOURCE="3" NAME="FEATDESC" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>

Query the CSV from SQL Server using the Format File

Finally, you can write a SELECT query in SQL Server to query the CSV file using  OPENROWSET in conjunction with the format file above. Since the CSV contains column headers, I’ve included the FIRSTROW=2 parameter to skip the header row. I’ve also used the geometry::Parse() method in the SELECT to dynamically create the geometry data from the Well-Known Text representation contained in the WKT column of the supplied CSV file.

SELECT
FEATCODE,
FEATDESC,
geography::Parse(WKT)
FROM OPENROWSET(
BULK 'C:\Settlement_Area.csv',
FORMATFILE='C:\Settlement_Area_format_file.xml',
FIRSTROW = 2
) AS CSV;

And here’s the results showing my Ordnance Survey data, originally provided as a shapefile using projected coordinates in SRID 27700, now loaded as latitude/longitude coordinates (EPSG:4326) using the geography datatype:

image

February 16, 2011

Loading Ordnance Survey Open Data into SQL Server 2008

One of the things that I didn’t realise until I started blogging was how interesting I would find the data collected on who actually reads my blog, where they’ve come from, and what they were looking for. As I expected based on the content I’ve published so far, many of my visitors have been looking for information on the Bing Maps tile system, on WMS servers, SQL Server spatial reference systems, and on terrain and elevation information. I hope that some of them have found some helpful information… However, last month, for example, I was also fascinated to find a number of people had found my site having searched for “Alastair Aitchison professional CV” – who are you? what were you looking for?!

Anyway, reading through my search engine referrals, another big area that people seem to be looking for information on is about using Ordnance Survey data in SQL Server. I have posted about the Ordnance Survey before, and about spatial data in SQL Server, but not specifically about the two together, so those people have probably been left disappointed. This post is an attempt to rectify the situation…

The Ordnance Survey, if you don’t know, is the national mapping agency of Great Britain, and it is the source of most of the high-quality mapping information in the country. For a long time, however, their data was tightly-controlled, and not affordable to use unless you were a large organisation capable of paying a corporate licence. This was hugely frustrating for small or charitable organisations trying to make use of spatial information.

Now, ex-Prime Minister Gordon Brown didn’t do much for the UK, but it is largely thanks to him that in the last year the situation has changed. The story goes that, at some formal dinner or other, Gordon Brown ended up having a conversation with Sir Tim Berners-Lee. The prime minister asked what the government should do to make better use of the internet, to which Berners-Lee replied “Put all the government’s data online”. It’s not happened overnight, but there have been some great steps taken towards increasing public access to UK government data, including crime information, census and demographic information, and spatial information from the Ordnance Survey. So here’s a step-by-step guide as to how to load that data.

Get Some OS Data

First, acquire the data from the Ordnance Survey Open Data website. There are several “products” to choose from. The ones I find most interesting are:

    1. Strategi, Meridian2, OS VectorMap District – vector polygon, polyline, and point features of e.g. administrative boundaries, developed land use areas, woodland, roads, rivers and coastline, at small- medium- and high- scale respectively.
    2. Code-Point Open – locates every postcode unit, and maps to corresponding coordinates / health authority / county etc.
    3. 1:50,000 Scale Gazetteer – as the name says, a gazetteer of placenames and locations

Note that the Ordnance Survey have not made all their data freely-available – if you want the really high-quality datasets (such as OS Mastermap), you’ll still have to pay for it, but OpenData gives you a good start. Even though the OpenData products are of limited resolution (equivalent to, say, the 1:50,000 Landranger series of maps), they are of good quality – better quality data than, say the equivalent  TIGER data in the U.S – and they are much better than having no free data at all, which was the case 12 months ago…

Prepare the Data

Most OS datasets are provided in ESRI shapefile format. If you’re using one of the smaller scale datasets (e.g. Strategi or Meridian2), each feature layer (i.e. roads, rivers, railways) will come in its own shapefile. You’ll probably want to keep each layer in its table in the database, so this is fine.

However, if you want to load the larger scale OS VectorMap data you’ll find that it is split up into smaller files, each one labelled with the appropriate 100km x 100km grid square reference (e.g. TQ, TL, NT…). Then, within the download of each of these grid squares,  there is a subdirectory for each 10km x 10km subsquare. So within the TQ directory there will be folders labelled TQ00, TQ01, TQ02, … , TQ99. Each of these subfolders follows exactly the same structure – containing one shapefile for each of the feature layers within that square. So, if you want to load the features of a complete 100km x 100km square of data in one go, you first need to merge the shapefiles in all of the subdirectories together. You can do so using OGR2OGR from the command line prompt as in the following script:

for /R %f in (*.shp) do (
  if not exist %~nf.shp (
    ogr2ogr -f “ESRI Shapefile” “%~nf.shp” “%f”
  )
  else (
    ogr2ogr -f “ESRI Shapefile” -update -append “%~nf.shp” “%f”
  )
)

This will look for all shapefiles in all subdirectories of the directory in which the script is run – if a file of the same name (e.g. AdministrativeBoundary.shp) has been found in another subdirectory already, they will be merged together, if not, a new file will be created. Resulting merged files will be created in the current directory. (This was adapted from a script by Darren Cope)

Load the Data

Having prepared the shapefiles, you can load them into SQL Server using a tool of your choice…. if you don’t want to splash out a commercial licence for Safe FME then I recommend Shape2SQL – you can use it to load shapefiles to SQL Server either via the GUI interface or via the commandline as described here.

Ordnance Survey data is defined using the National Grid of Great Britain, so always make sure that you load your data into a column of the geometry datatype, using SRID 27700.

image

And that’s all there is to it! Here’s the data loaded from combined table of the OS Vector Map settlement_area and road_line shapefiles, showing the buildings and roads of the city centre of Norwich (still beautiful even when viewed in the pastel shades of SQL Server Management Studio):

image

Follow

Get every new post delivered to your Inbox.

Join 53 other followers