Sunday 19 September 2010

SQLBits 7 - the biggest yet

While I didn’t get enough votes to speak at the SQLBits conference at the end of this month (maybe next time) it is still panning out to be the biggest and best ever with over 500 people registered and some fantastic speakers lined up.

I’m looking forward to catching up with some old friends there along with making some new ones. If you see me please drop by and say hello.

John

Bad Pie Chart

Yes, I know it is an tautology, but under a very small number of circumstances a pie chart can almost work. This one that was published by the California Institute for Telecommunications and Information Technology doesn't work on any level.
As you can see, it is basically 3 pie charts nested one within the other to form concentric rings from which the reader is supposed to be able to make comparisons of changing reading habits between 1960 and 2008.

So right from the start we have an inconsistent timeline with the chart showing a 20 year gap followed by an 18 year gap. This has the effect of preventing any meaningful information being conveyed about the rate of change. If figures are available for 2008, a non-decadal year, then it is reasonable to assume that the author would have been able to locate data for years that would have given a consistent timeline and thus left us, the poor reader with some hope of extracting information from the graphic (assuming that we have a protractor to hand, of course).

This brings us on to a second problem, if this chart is intended to display the changing picture of reading patterns in an undisclosed area, would any one like to postulate how the contribution of Radio has changed between 1960 and 2008? How about is it bigger or less in 2008 than it was in 1960?

In fact, I did get a protractor out, in 1960 the Radio sector was 72 degrees and in 2008 it was 38 degrees so according to this chart it has shrunk by almost half.

this highlights the third problem with this pie chart, because the area taken up by a uniform width band will increase the further it is from the centre of a circle, the area occupied by the 1960 Radio segment is very similar to the area occupied by the segment in 2008. Because people are better at judging area than they are angles the reader is fooled into believing that the segments represent similar populations.

So we are able to reliably extract 4 numbers from this chart without spending time with a printout and a protractor are the three figures for print on the selected years and a figure for computer in 2008 because the author has provided numbers for these. and given that there is no entry for computer prior to 2008 we are left to wonder at the growth rate of this medium, from the chart we could be forgiven for thinking that the computer sprung into being in 2008, but certainly after 1980. this would be something of a shock to those of us using Apple IIs and Sinclairs in the 1970s. So three numbers that are related to the time series, wrapped up in a lot of non-informative ink. This should have been a line chart, or if the data is only available in discreet segments a histogram.
  
John

Geographic datatypes to create custom visualisations in SQL 08

I recently stumbled across a blog by Teo Lachev describing how to use the geometric datatypes in SQL Server 08 to produce squarified heat maps in SSRS. (http://prologika.com/CS/blogs/blog/archive/2009/08/30/heat-maps-as-reports.aspx). In a previous role I looked into creating a heat map of this type for Reporting Services 05. To do this would have involved creating a custom component / tool for SSRS to draw the visualisation. We ultimately decided that this represented too much effort to peruse. (Which is why this entry caught my eye).
Much has been written on the subject of the geospatial capabilities of SQL Server but what this blog hints at is that because you can now store and manipulate geometric polygons natively in SQL Server, almost any visualisation is possible! Naturally this also opens a huge number of potential holes and increases the scope to produce truly awful visualisations. With this in mind you should have read and understood at least one of Stephen Few’s books before attempting to produce any graphical representation of information.  J  -  Stephen’s blog and details of his books and courses can be found at http://www.perceptualedge.com/)
For example if we look at the pie chart devised by Florence Nightingale to show the monthly breakdown of causes of death during 1854/55, we can see that this is made up of segments where the number of deaths is measured from the centre of the graph giving a ‘ragged’ pie chart. This is not easy using the native visualisations in SSRS, though it may be possible to coerce a radar chart to produce something similar.
 However each of these ‘segments’ can be drawn using the geometric data types and methods geometry::STPolyFromText(). I won’t labour the over the technical details again here as Teo has already covered them very well in his blog.
As a footnote, I should add that this chart does not represent good practice in terms of visualisation. As with any pie chart it is intended to distort the information in order to emphasise or hide something. In this case because the segments are arcs the outer edge holds more area than the inner. As a result the audience is given a greater impression of the importance of the higher values here and in this case make them more  likely to support Florence Nightingale’s argument that greater effort should be spent on disease prevention.
 A fairer representation of the data would have been a line graph.  

SQL Bits

SQL Bits The 7 Wonders of SQL Server conference registration is now open.

This is the biggest – and in my opinion the best – SQL Server conference in Europe. It runs from Thursday September 30th with a training day through the premium Friday conference to the free Community day on Saturday October 2nd. Register at http://www.sqlbits.com/ and I look forward to seeing you in York.

This will be my 4th SQL Bits and hopefully my first presenting. I have submitted a session on data visualisation best practice ‘Lies, Damn Lies and Pie Charts’ where I will use a selection of good and bad visualisations to talk through why some things work while others don’t. As with any meaningful discussion on this topic I will be referencing the work of Stephen Few (http://www.perceptualedge.com/) and Edward Tufte (http://www.edwardtufte.com/tufte/).

Once you have registered you will have an opportunity to vote for the sessions that you would like to see presented – this does not obligate you to attending your selections, you can decide that on the day. So please vote for Lies, Damn Lies and Pie Charts plus any other 9 sessions you like the sound of at http://www.sqlbits.com/information/PublicSessions.aspx once you have registered.

Thanks

John