There was a time, not so long ago, when pretty much all business graphics looked like this:
The global hegemony of Microsoft Excel (particularly, Excel 97), deployed onto millions and millions of users’ desktop PCs meant that the world was awash with poorly-labelled, hideously-coloured, and sometimes just plain wrong line charts and bar graphs. I’m not blaming Excel itself – it was a fantastically successful product – but the problem was that a huge number of people were producing business reports who had no training on the software, little background in statistics itself, and certainly no explicit knowledge of data visualisation methods (which, at the time, were unheard of as an academic discipline).
But then things got even worse, as people discovered the ability to create “3d” charts. All of a sudden, the unattractive but essentially harmless business graphics above morphed into things like this:
The general criticisms of these “3d” charts are quite well known, so I’m not going to repeat them here. However, to give two specific comments on the examples above:
- The artificial “perspective” added to the pie chart distorts the true relationship between the values in the series. For example, what is the relative size of the large purple slice compared to the blue slice? You can’t tell because the purple slice is shown closer to the viewer, enlarging its relative area and giving it distorted prominence.
- In the line chart, the shadow effects added to the extruded ends of each column make it difficult to read across to the axis to find out the exact value of an item. Furthermore, in several places the blue columns in the foreground (inquiries) obscure the red columns behind (sales).
“3d” charts enjoyed a brief life as the popular visualisation of choice, seen to make data more exciting and, thus, making managers pay more attention to it (also perhaps in the hope that adding “depth” to a graphic added depth to the corresponding data analysis?!).
However, before too long, they were debunked as being nothing more than marketing gloss. In fact, much more than that, 3d charts were proclaimed as being misleading and dangerous, and were wholeheartedly rejected by most members of the data visualisation community. In a write-up of one of the Jen Stirrup’s sessions from SQLBits8 conference last month, Luke Merrett describes the learning point he takes away as “…you should never, EVER use 3D graphs for reporting…”.
I wonder if 3d charts, which not so long ago were being enthusiastically embraced and adopted, have now been wholeheartedly disregarded with equal lack of consideration. So, I’d like to take this opportunity to stick up for 3d graphs and explain why, in certain situations, I believe they are an appropriate way to visualise sets of data.
“3D” Graphs or 3D Graphs?
Before I do, I need to clarify something. In the preceding paragraphs, you might have noticed that I carefully referred to “3d” graphs, not 3d graphs. Why?
As far as I’m concerned, the examples of the tilted pie chart and the extruded column chart with shadows shown above are not 3d charts. They are 2d charts that have had a graphic effect applied to them to give them a sense of depth. That effect is not in any way linked to the underlying data set – it has no corresponding dimension (from a data point of view). These arbitrary graphic effects take up space in the chart without adding any further information about the data, which is pretty much the definition of “chart junk”.
I referred to these images as “3d” charts, but they should really be described as two-and-a-half dimensional charts. Clearly (unless you’ve got some sort of very expensive holographic-projection monitor), it’s impossible to create a chart that is displayed in three-dimensional space. However, it is possible to create a 2-dimensional representation of a chart that represents data in 3 dimensions, and that is what I regard as a 3d chart.
As an example, yesterday I was doing some performance testing of spatial indexes in SQL Server Denali. There are lots of factors that can influence the performance of SQL Server spatial queries, but I decided to focus on two factors – the number of rows in the underlying base table, and the selectivity of the query window (measured as a percentage of the bounding box). I ran a test query repeatedly, changing these two independent variables and measuring the execution time of the query (my dependent variable).
At the end of my tests, I had gathered over 15,000 individual test results, and I decided to plot them using a 3d surface chart as follows:
Why do I think this 3d graph is suitable for this data?
- It’s clear to see, at a glance, the way in which, separately, the number of rows in the table and the size of the query window affects execution time, and also the way in which they affect it in combination. Since I’m using only a single set of axes for all my data, you can easily see which of the independent variables has a greater effect on the dependent variable.
- With the exception of a few spikes, the dependent variable (execution time) increases monotonically with both independent variables. That is to say, as the number of rows in the base table increases and the size of the query window increases, the execution time will always increase (not necessarily at a consistent rate, but it doesn’t fall). This means that, so long as the axes are oriented correctly, there is no risk of a higher value in the foreground of the chart obscuring a smaller value behind it.
- I don’t need to know exact values corresponding to any point on this graph. The results illustrated in this chart were obtained from a test system and are intended to be illustrative of the shape of the graph – the exact number of milliseconds taken to execute a query will depend on the exact configuration of the server and the query in question anyway, so it is not important for people to be able to tell these. (In fact, I could remove the z axis labels altogether, but I’ve left them in to demonstrate that it a linear scale has been used). At a broad level, colour is used to double-encode the z value into one of a number of distinct categories so you can see the “bands” of execution times.
- This graph plots execution times of a spatial query, and the audience who are interested in the results are familiar with examining representations of spatial data, including 3d terrain maps. I’d therefore argue that the presentation method of this data has been chosen with consideration of the likely end user (another thing often forgotten by people presenting data), and vaguely resembles a terrain map at the base of a mountain, for example.
It has to be said that the Excel implementation of a 3d surface chart has a few deficiencies – I’d like to have been able to set a semi-transparent fill on the surface, for example. Also, the two independent variables are treated as distinct categories and must be equally spaced along the x and y axes. If you’re plotting continuous independent variables, as I was in this case, you therefore need to ensure that your data is sampled at consistent intervals.
My next challenge, of course, is that I’m currently only considering two of the variables that effect performance of spatial queries. I now need to think about the effect of other factors: the size of the bounding box, the cells_per_object limit, and the resolution of the grid cells in the index, for example. To display the results taking these factors into consideration, I’m thinking about a panel of 3d surface charts… if any “3d chart”-deniers have any comments, or can propose a better solution, I’d love to hear from you!
Just to re-iterate my prevous offer, I can try to help you to visualise the data if you make it available, perhaps on a SkyDrive or similar?
Thanks for the kind offer, Jen. You can find a summary of the source data at https://alastaira.files.wordpress.com/2011/04/spatial-index-performance-testing-data.xlsx
– “Raw Data” gives you the columnar results, including the average execution time for each query run with a given set of rowsinbasedtable / query_window_size parameters
– “Pivot Table” selects just the LOW level_1_grid results – the subset used in this graph – and puts them into a cross table
– “3d surface chart” shows the exact chart I’ve posted above.
I look forward to hearing your comments!
Pingback: Business Data: 2D or 3D « copperblueconsulting
I am interested in how you created the “3d” graph above regarding Query Window, Number of rows, and Average Time. Do you use an additional program or is it done in Excel?
That’s all plain, vanilla Excel (2007, I think).
Ok, that what I need, now how do I do it? I have the following data set. notice the YYMMDD is repetitive for the 24 hours in that day. I have about 26,000 rows of data.
YYMMDD Hour Load
110801 1 24,477
110801 2 21,157
110801 3 19,565
110801 4 19,193
110801 5 19,708
110801 6 21,169
110801 7 23,290
110801 8 25,285
110801 9 28,296
110801 10 32,005
110801 11 35,768
110801 12 39,642
110801 13 40,148
110801 14 43,942
110801 15 46,767
110801 16 48,929
110801 17 50,675
110801 18 53,307
110801 19 53,154
110801 20 50,376
110801 21 45,877
110801 22 43,995
110801 23 38,821
110801 24 30,508
110802 1 24,249
110802 2 20,304
110802 3 18,302
110802 4 17,603
110802 5 17,697
110802 6 18,842
110802 7 20,156
110802 8 22,307
110802 9 26,745
110802 10 32,489
110802 11 37,249
110802 12 41,920
110802 13 45,978
110802 14 49,872
110802 15 53,852
110802 16 56,349
110802 17 55,983
110802 18 55,153
110802 19 57,234
110802 20 53,217
110802 21 49,463
110802 22 47,672
110802 23 40,999
110802 24 33,076