There was a time, not so long ago, when pretty much all business graphics looked like this:
The global hegemony of Microsoft Excel (particularly, Excel 97), deployed onto millions and millions of users’ desktop PCs meant that the world was awash with poorly-labelled, hideously-coloured, and sometimes just plain wrong line charts and bar graphs. I’m not blaming Excel itself – it was a fantastically successful product – but the problem was that a huge number of people were producing business reports who had no training on the software, little background in statistics itself, and certainly no explicit knowledge of data visualisation methods (which, at the time, were unheard of as an academic discipline).
But then things got even worse, as people discovered the ability to create “3d” charts. All of a sudden, the unattractive but essentially harmless business graphics above morphed into things like this:
The general criticisms of these “3d” charts are quite well known, so I’m not going to repeat them here. However, to give two specific comments on the examples above:
- The artificial “perspective” added to the pie chart distorts the true relationship between the values in the series. For example, what is the relative size of the large purple slice compared to the blue slice? You can’t tell because the purple slice is shown closer to the viewer, enlarging its relative area and giving it distorted prominence.
- In the line chart, the shadow effects added to the extruded ends of each column make it difficult to read across to the axis to find out the exact value of an item. Furthermore, in several places the blue columns in the foreground (inquiries) obscure the red columns behind (sales).
“3d” charts enjoyed a brief life as the popular visualisation of choice, seen to make data more exciting and, thus, making managers pay more attention to it (also perhaps in the hope that adding “depth” to a graphic added depth to the corresponding data analysis?!).
However, before too long, they were debunked as being nothing more than marketing gloss. In fact, much more than that, 3d charts were proclaimed as being misleading and dangerous, and were wholeheartedly rejected by most members of the data visualisation community. In a write-up of one of the Jen Stirrup’s sessions from SQLBits8 conference last month, Luke Merrett describes the learning point he takes away as “…you should never, EVER use 3D graphs for reporting…”.
I wonder if 3d charts, which not so long ago were being enthusiastically embraced and adopted, have now been wholeheartedly disregarded with equal lack of consideration. So, I’d like to take this opportunity to stick up for 3d graphs and explain why, in certain situations, I believe they are an appropriate way to visualise sets of data.
“3D” Graphs or 3D Graphs?
Before I do, I need to clarify something. In the preceding paragraphs, you might have noticed that I carefully referred to “3d” graphs, not 3d graphs. Why?
As far as I’m concerned, the examples of the tilted pie chart and the extruded column chart with shadows shown above are not 3d charts. They are 2d charts that have had a graphic effect applied to them to give them a sense of depth. That effect is not in any way linked to the underlying data set – it has no corresponding dimension (from a data point of view). These arbitrary graphic effects take up space in the chart without adding any further information about the data, which is pretty much the definition of “chart junk”.
I referred to these images as “3d” charts, but they should really be described as two-and-a-half dimensional charts. Clearly (unless you’ve got some sort of very expensive holographic-projection monitor), it’s impossible to create a chart that is displayed in three-dimensional space. However, it is possible to create a 2-dimensional representation of a chart that represents data in 3 dimensions, and that is what I regard as a 3d chart.
As an example, yesterday I was doing some performance testing of spatial indexes in SQL Server Denali. There are lots of factors that can influence the performance of SQL Server spatial queries, but I decided to focus on two factors – the number of rows in the underlying base table, and the selectivity of the query window (measured as a percentage of the bounding box). I ran a test query repeatedly, changing these two independent variables and measuring the execution time of the query (my dependent variable).
At the end of my tests, I had gathered over 15,000 individual test results, and I decided to plot them using a 3d surface chart as follows:
Why do I think this 3d graph is suitable for this data?
- It’s clear to see, at a glance, the way in which, separately, the number of rows in the table and the size of the query window affects execution time, and also the way in which they affect it in combination. Since I’m using only a single set of axes for all my data, you can easily see which of the independent variables has a greater effect on the dependent variable.
- With the exception of a few spikes, the dependent variable (execution time) increases monotonically with both independent variables. That is to say, as the number of rows in the base table increases and the size of the query window increases, the execution time will always increase (not necessarily at a consistent rate, but it doesn’t fall). This means that, so long as the axes are oriented correctly, there is no risk of a higher value in the foreground of the chart obscuring a smaller value behind it.
- I don’t need to know exact values corresponding to any point on this graph. The results illustrated in this chart were obtained from a test system and are intended to be illustrative of the shape of the graph – the exact number of milliseconds taken to execute a query will depend on the exact configuration of the server and the query in question anyway, so it is not important for people to be able to tell these. (In fact, I could remove the z axis labels altogether, but I’ve left them in to demonstrate that it a linear scale has been used). At a broad level, colour is used to double-encode the z value into one of a number of distinct categories so you can see the “bands” of execution times.
- This graph plots execution times of a spatial query, and the audience who are interested in the results are familiar with examining representations of spatial data, including 3d terrain maps. I’d therefore argue that the presentation method of this data has been chosen with consideration of the likely end user (another thing often forgotten by people presenting data), and vaguely resembles a terrain map at the base of a mountain, for example.
It has to be said that the Excel implementation of a 3d surface chart has a few deficiencies – I’d like to have been able to set a semi-transparent fill on the surface, for example. Also, the two independent variables are treated as distinct categories and must be equally spaced along the x and y axes. If you’re plotting continuous independent variables, as I was in this case, you therefore need to ensure that your data is sampled at consistent intervals.
My next challenge, of course, is that I’m currently only considering two of the variables that effect performance of spatial queries. I now need to think about the effect of other factors: the size of the bounding box, the cells_per_object limit, and the resolution of the grid cells in the index, for example. To display the results taking these factors into consideration, I’m thinking about a panel of 3d surface charts… if any “3d chart”-deniers have any comments, or can propose a better solution, I’d love to hear from you!