Following a recent conversation with Isaac Kunen from the StreamInsight team, I decided that I should find out more about Microsoft’s complex event processing engine. Technology moves so fast that there are probably a hundred topics that I wish I’d know more about and, although I had a superficial idea what StreamInsight does, I’d never really looked at it in much depth because I didn’t think I had any particular use for it. So this post captures my first baby steps with using StreamInsight – hopefully it should provide some guidance for anyone else who hasn’t tried it yet, and perhaps a few chuckles for those people who are already experts in the field.
What Is StreamInsight, anyway?
The first hurdle I had to overcome was not a technical one, but a mental one: what exactly is StreamInsight and what it it for? You can read all the release material from Microsoft which I won’t bother repeating here, but these are the things I had to get my head around:
- Firstly, although it’s branded and licensed as part of SQL Server, StreamInsight really has nothing to do with SQL Server, either in terms of architecture or purpose. You don’t need to have SQL Server installed on the machine on which StreamInsight runs, for example, and you won’t see a line of T-SQL code in StreamInsight.
- StreamInsight is an entirely in-memory engine for processing streams of input (i.e. a time-ordered series of events) – it can write output to SQL Server, but it can also write to a Console app, WCF service etc. It doesn’t have to persist any data at all.
- Although the examples tend to focus on scenarios in which you’d have live streams of event data coming from e.g. monitoring systems, stock trackers etc., you can also use StreamInsight to analyse any temporal data, by streaming it into the StreamInsight engine using an input adaptor from e.g. a text/CSV file or database. The timestamp used to order the events in a stream can be supplied from historical data and does not have to bear any relation to the actual time at which it is received by the engine.
- You write queries using LINQ to analyse those events passing through the engine. There are different types of temporal query pattern – for example, analysing those events that occur in rolling windows of a fixed time length, or at a particular snapshot in time.
- Every event carries an associated payload of data – user-defined fields of information attached to that event. These are based on fundamental .NET entity types – string, binary etc. Of interest to me, particularly, is that means you can have spatial information in an event payload, either as WKT, WKB, or just using the native SqlGeography/SqlGeometry serialisation format.
- The payload information of those events falling within a specified window (or filtered using some other query template) can be aggregated, manipulated, or have other query logic applied. This information can then be routed through to one or more output adaptors, which can write the results to a file, database, or service.
StreamInsight and Spatial. An Example Application Plan.
So, how could I put this all together and make an example related to spatial? Well, here was my plan (explained using my limited knowledge of StreamInsight terminology):
- I would create an input adaptor that would load a set of spatial point data that also has an associated temporal value into the StreamInsight engine. The sort of thing I’ve got in mind is a dataset of the location and time at which cases of a particular virus outbreak (e.g. H5N1/Foot and Mouth etc.) were reported.
- Since I’m considering notifications to occur at a singular moment in time, I’d use the point event model. (If, instead, I wanted to consider the period of time for which a farm was declared “infected”, I might use an edge model instead)
- The payload attached to each event would contain a geography Point instance representing the location at which the occurrence occurred (possibly serialised as Well-Known Binary, or maybe just WKT)
- In SI, I’d define a hopping window that would consider, say, all those events in the preceding 3 days leading up to any day.
- Then, in my query I’d create a User-Defined Aggregate that took the events in the current window and created a convex hull around them (using the geography ConvexHull() method introduced in SQL Server Denali Microsoft.SqlServer.Types.dll).
- Finally, I’d create an output adaptor that would send (via a WCF service?) the resulting geography Polygon of the current event window to be visualised on Bing Maps. The map would refresh to show a time-based spatial analysis of the spread of the virus.
That’s my plan, at least, and if you follow over the next few blog posts you’ll see how I get on in actually achieving it. I haven’t prepared these posts in advance, so there’s a very real chance that I’ll get halfway through and give up, crying. We’ll see.
For this post, I’ll start right at the beginning and just jot down a few notes about getting StreamInsight installed and configured.
Installing and Configuring StreamInsight
- The latest version of StreamInsight (v1.2), can be downloaded from the StreamInsight Download Centre at http://www.microsoft.com/download/en/details.aspx?id=26720.
- There are x86 and x64 versions available. Since StreamInsight is not a component of SQL Server, there is no need to choose a version that correlates with any other SQL Server tools you’ve already got installed. I’m running a x64 system so I downloaded the x64 StreamInsight package, even though I’ve got an x86 SQL Server installation on this machine.
- The package comes in two flavours: one is for client only (i.e. allows you to connect to existing StreamInsight services) whereas the other is the full package. I went with the full shebang option (which, curiously, is exactly the same filesize as the client-only version).
- The package is very small and installation is quick. What I found a bit odd is that, after installation, I was prompted to install version 3.5 SP2 of SQL Server Compact. Why should a product that is licensed as part of SQL Server 2008 R2 have a dependency on SQL Server Compact? What’s even more odd is that, if install the x64 platform version of StreamInsight, you need to install both the x86 and x64 versions of SQL Server Compact. Fortunately, these are both included in the redist folder of StreamInsight, and only take a few minutes to set up, so it’s not too much of a big deal.
- After installation, there’s not much to see. Stream Insight is a service and a SDK rather that an “application”. There is one executable added to the Start Menu, which is the Event Flow Debugger. Because I’m the kind of person who clicks on programs that are newly-installed, I tried firing this up – it had already pre-filled in the instance name I supplied during installation, so I clicked OK:
Hmm. No Dice:
I wondered whether the StreamInsight service was actually running. After all, the installation was very short, and I don’t remember setting up anything relating to service accounts etc. unlike in a SQL Server installation, for example. Sure enough, StreamInsight was set to be a manual service, and wasn’t currently running, so I tried to manually start it:
Documentation is boring so I downloaded and browsed through the codeplex samples instead. The one entitled “PatternDetector” looked interested, so I loaded it up. I changed the server connection string where indicated to provide the name of my StreamInsight server specified during installation:
Then I built the package, hit F5 and, what do you know, it worked! At least, I think it worked – here’s the output:
So that’s the end of my first experience with StreamInsight. In my next post, I’ll look at how to create a Input Adaptor to load events into the engine containing a spatial payload.