In Defence of 3D Charts…

There was a time, not so long ago, when pretty much all business graphics looked like this:

image  image

The global hegemony of Microsoft Excel (particularly, Excel 97), deployed onto millions and millions of users’ desktop PCs meant that the world was awash with poorly-labelled, hideously-coloured, and sometimes just plain wrong line charts and bar graphs. I’m not blaming Excel itself – it was a fantastically successful product – but the problem was that a huge number of people were producing business reports who had no training on the software, little background in statistics itself, and certainly no explicit knowledge of data visualisation methods (which, at the time, were unheard of as an academic discipline).

But then things got even worse, as people discovered the ability to create “3d” charts. All of a sudden, the unattractive but essentially harmless business graphics above morphed into things like this:

image  image

The general criticisms of these “3d” charts are quite well known, so I’m not going to repeat them here. However, to give two specific comments on the examples above:

  • The artificial “perspective” added to the pie chart distorts the true relationship between the values in the series. For example, what is the relative size of the large purple slice compared to the blue slice? You can’t tell because the purple slice is shown closer to the viewer, enlarging its relative area and giving it distorted prominence.
  • In the line chart, the shadow effects added to the extruded ends of each column make it difficult to read across to the axis to find out the exact value of an item. Furthermore, in several places the blue columns in the foreground (inquiries) obscure the red columns behind (sales).

“3d” charts enjoyed a brief life as the popular visualisation of choice, seen to make data more exciting and, thus, making managers pay more attention to it (also perhaps in the hope that adding “depth” to a graphic added depth to the corresponding data analysis?!).

However, before too long, they were debunked as being nothing more than marketing gloss. In fact, much more than that, 3d charts were proclaimed as being misleading and dangerous, and were wholeheartedly rejected by most members of the data visualisation community. In a write-up of one of the Jen Stirrup’s sessions from SQLBits8 conference last month, Luke Merrett describes the learning point he takes away as “…you should never, EVER use 3D graphs for reporting…”.

I wonder if 3d charts, which not so long ago were being enthusiastically embraced and adopted, have now been wholeheartedly disregarded with equal lack of consideration. So, I’d like to take this opportunity to stick up for 3d graphs and explain why, in certain situations, I believe they are an appropriate way to visualise sets of data.

“3D” Graphs or 3D Graphs?

Before I do, I need to clarify something. In the preceding paragraphs, you might have noticed that I carefully referred to “3d” graphs, not 3d graphs. Why?

As far as I’m concerned, the examples of the tilted pie chart and the extruded column chart with shadows shown above are not 3d charts. They are 2d charts that have had a graphic effect applied to them to give them a sense of depth. That effect is not in any way linked to the underlying data set – it has no corresponding dimension (from a data point of view). These arbitrary graphic effects take up space in the chart without adding any further information about the data, which is pretty much the definition of “chart junk”.

I referred to these images as “3d” charts, but they should really be described as two-and-a-half dimensional charts. Clearly (unless you’ve got some sort of very expensive holographic-projection monitor), it’s impossible to create a chart that is displayed in three-dimensional space. However, it is possible to create a 2-dimensional representation of a chart that represents data in 3 dimensions, and that is what I regard as a 3d chart.

As an example, yesterday I was doing some performance testing of spatial indexes in SQL Server Denali. There are lots of factors that can influence the performance of SQL Server spatial  queries, but I decided to focus on two factors – the number of rows in the underlying base table, and the selectivity of the query window (measured as a percentage of the bounding box). I ran a test query repeatedly, changing these two independent variables and measuring the execution time of the query (my dependent variable).

At the end of my tests, I had gathered over 15,000 individual test results, and I decided to plot them using a 3d surface chart as follows:

image

Why do I think this 3d graph is suitable for this data?

  • It’s clear to see, at a glance, the way in which, separately, the number of rows in the table and the size of the query window affects execution time, and also the way in which they affect it in combination. Since I’m using only a single set of axes for all my data, you can easily see which of the independent variables has a greater effect on the dependent variable.
  • With the exception of a few spikes, the dependent variable (execution time) increases monotonically with both independent variables. That is to say, as the number of rows in the base table increases and the size of the query window increases, the execution time will always increase (not necessarily at a consistent rate, but it doesn’t fall). This means that, so long as the axes are oriented correctly, there is no risk of a higher value in the foreground of the chart obscuring a smaller value behind it.
  • I don’t need to know exact values corresponding to any point on this graph. The results illustrated in this chart were obtained from a test system and are intended to be illustrative of the shape of the graph – the exact number of milliseconds taken to execute a query will depend on the exact configuration of the server and the query in question anyway, so it is not important for people to be able to tell these. (In fact, I could remove the z axis labels altogether, but I’ve left them in to demonstrate that it a linear scale has been used). At a broad level, colour is used to double-encode the z value into one of a number of distinct categories so you can see the “bands” of execution times.
  • This graph plots execution times of a spatial query, and the audience who are interested in the results are familiar with examining representations of spatial data, including 3d terrain maps. I’d therefore argue that the presentation method of this data has been chosen with consideration of the likely end user (another thing often forgotten by people presenting data), and vaguely resembles a terrain map at the base of a mountain, for example.

It has to be said that the Excel implementation of a 3d surface chart has a few deficiencies – I’d like to have been able to set a semi-transparent fill on the surface, for example. Also, the two independent variables are treated as distinct categories and must be equally spaced along the x and y axes. If you’re plotting continuous independent variables, as I was in this case, you therefore need to ensure that your data is sampled at consistent intervals.

My next challenge, of course, is that I’m currently only considering two of the variables that effect performance of spatial queries. I now need to think about the effect of other factors: the size of the bounding box, the cells_per_object limit, and the resolution of the grid cells in the index, for example. To display the results taking these factors into consideration, I’m thinking about a panel of 3d surface charts… if any “3d chart”-deniers have any comments, or can propose a better solution, I’d love to hear from you!

This entry was posted in General Development, SQL Server and tagged , . Bookmark the permalink.

6 Responses to In Defence of 3D Charts…

  1. Jen Stirrup says:

    Just to re-iterate my prevous offer, I can try to help you to visualise the data if you make it available, perhaps on a SkyDrive or similar?

  2. Pingback: Business Data: 2D or 3D « copperblueconsulting

  3. Stephen Green says:

    I am interested in how you created the “3d” graph above regarding Query Window, Number of rows, and Average Time. Do you use an additional program or is it done in Excel?

  4. Stephen Green says:

    Ok, that what I need, now how do I do it? I have the following data set. notice the YYMMDD is repetitive for the 24 hours in that day. I have about 26,000 rows of data.

    YYMMDD Hour Load
    110801 1 24,477
    110801 2 21,157
    110801 3 19,565
    110801 4 19,193
    110801 5 19,708
    110801 6 21,169
    110801 7 23,290
    110801 8 25,285
    110801 9 28,296
    110801 10 32,005
    110801 11 35,768
    110801 12 39,642
    110801 13 40,148
    110801 14 43,942
    110801 15 46,767
    110801 16 48,929
    110801 17 50,675
    110801 18 53,307
    110801 19 53,154
    110801 20 50,376
    110801 21 45,877
    110801 22 43,995
    110801 23 38,821
    110801 24 30,508
    110802 1 24,249
    110802 2 20,304
    110802 3 18,302
    110802 4 17,603
    110802 5 17,697
    110802 6 18,842
    110802 7 20,156
    110802 8 22,307
    110802 9 26,745
    110802 10 32,489
    110802 11 37,249
    110802 12 41,920
    110802 13 45,978
    110802 14 49,872
    110802 15 53,852
    110802 16 56,349
    110802 17 55,983
    110802 18 55,153
    110802 19 57,234
    110802 20 53,217
    110802 21 49,463
    110802 22 47,672
    110802 23 40,999
    110802 24 33,076

    Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s