In a previous post, I used OGR2OGR to join a set of shapefiles together prior to loading them into SQL Server using Shape2SQL. But OGR2OGR can do much more than simply appending shapefiles – it can convert data into different formats, transform coordinates into different spatial references systems, read and write a whole variety of spatial datasources, and (with a bit of fiddling) load them into SQL Server 2008.
To demonstrate, I thought I’d repeat the objective of my previous post, but instead of simply appending the Ordnance Survey open data shapefiles together, I would reproject them into a different SRID, and import them into SQL Server too using OGR2OGR alone, without using Shape2SQL or any other tools.
Use OGR2OGR to create a CSV file containing WKT
OGR2OGR can’t insert spatial data directly into SQL Server 2008, but it can create CSV files that can be read into SQL Server. To create an output file in CSV format, set the –f CSV output option. You can also manually set layer creation options to dictate the field delimiter and line return character of the CSV file, using the LINEFORMAT and SEPARATOR layer creation options.
By default, the CSV file created only contains columns containing the attributes associated with each shape (i.e the data in the .dbf file associated with a shapefile), but it doesn’t include the shape itself. To include the shape of the geometry in Well-Known Text format, you also need to state the –lco “GEOMETRY=AS_WKT” layer creation option.
But that’s not all – OGR2OGR can also reproject data between coordinate systems as it converts it. To convert to another spatial reference system, include the -t_srs parameter, together with either the full WKT definition (hard to escape properly,i.e. GEOGCS(ELLIPSOID(“WGS 84”, ……) or the EPSG code (much easier, i.e. EPSG:4326 for WGS84).
Putting this all together, you can use the following command to load a shapefile (in this case, the Ordnance Survey OS Vector Map settlement_area polygon shapefile), reproject the data into SRID 4326, and then save the result as a CSV file containing a column with the WKT of each resulting polygon:
ogr2ogr -f "CSV" "Settlement_Area" "Settlement_Area.shp" -t_srs "EPSG:4326" -lco "GEOMETRY=AS_WKT" -lco "LINEFORMAT=CRLF" -lco "SEPARATOR=SEMICOLON"
NOTE – I’ve noticed some problems with the quote/speechmark characters in this blog being changed to “smartquotes”, even when I use the code-formatting option. OGR2OGR isn’t happy about this, so I recommend that you don’t try and copy and paste the line above into a CMD window – if you get an error about “Couldn’t fetch request layer uT” or something similar then be sure to retype using normal speech marks.
More information on CSV options for OGR2OGR can be found here: http://www.gdal.org/ogr/drv_csv.html
Create a Format File
To import the CSV into SQL Server, we have a range of options – you could use SSIS, or you could use the Tasks –> Import Data wizard. Or you could use BCP or BULK INSERT. However, I’m going to use OPENROWSET. This will allow me to write a single query to access the CSV file directly from T-SQL as if it were an existing table.
In order to do this, we first need to define a format file which specifies the datatypes of each of the columns in the input csv file, and how they should be mapped to datatypes of columns in a sql table. More information on format files can be found here.
The following shows the XML format file required for the OS VectorMaps Settlement Area shapefile just converted:
<?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="0" xsi:type="CharTerm" TERMINATOR="""/> <FIELD ID="1" xsi:type="CharTerm" MAX_LENGTH="2147483647" TERMINATOR="";"/> <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=";"/> <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"/> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="WKT" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="2" NAME="FEATCODE" xsi:type="SQLDECIMAL"/> <COLUMN SOURCE="3" NAME="FEATDESC" xsi:type="SQLVARYCHAR"/> </ROW> </BCPFORMAT>
Query the CSV from SQL Server using the Format File
Finally, you can write a SELECT query in SQL Server to query the CSV file using OPENROWSET in conjunction with the format file above. Since the CSV contains column headers, I’ve included the FIRSTROW=2 parameter to skip the header row. I’ve also used the geometry::Parse() method in the SELECT to dynamically create the geometry data from the Well-Known Text representation contained in the WKT column of the supplied CSV file.
SELECT FEATCODE, FEATDESC, geography::Parse(WKT) FROM OPENROWSET( BULK 'C:\Settlement_Area.csv', FORMATFILE='C:\Settlement_Area_format_file.xml', FIRSTROW = 2 ) AS CSV;
And here’s the results showing my Ordnance Survey data, originally provided as a shapefile using projected coordinates in SRID 27700, now loaded as latitude/longitude coordinates (EPSG:4326) using the geography datatype: