Sunday 10 October 2010

Reminiscences from SQLBits 7

SQLBits is over for another 6 months. as ever it was a real pleasure to spend a day catching up with old friends, making new friends and hearing some very knowledgeable people speaking about SQL Server. Aside from taking some time to say a huge thankyou to the organising committee, the many people who help both at the event and doing the physical prep for it, and the fantastic speakers, all of whom give up their time for free. I wanted to share with you some of the things I took away from the conference.

I was not able to get to the Friday sessions, so the following are all from the Saturday agenda, also not having a time machine I was only able to attend 20% of the sessions on offer, pleases add your own memories from both days to the comments.

Thanks...

I started off the day with Thomas Kejser from the SQL CAT team, who was discussing designing and tuning high speed data loading.

Thomas currently holds the world record for the fastest ETL load so it is fair to say he knows what he is talking about. The biggest takeaway for me from this session was his discussions on using bulk insert to keep loggong to a minimum and how he designs his SSIS packages to ensure maximum parallelism.

When doing a bulk insert into a heap using the TABLOCK hint the insert will be minimally logged and will create a 'BULK UPDATE' lock on the target table. The BU lock will allow multiple threads to bulk update the table but will lock out any other processes that attempt to use the table for non-bulk update actions. What is new in 2008 is that the same behaviour can be produced when inserting into a table with a clustered index. This is achieved by setting the TF-610 flag to on as discussed in Sunil Agarwal's blog.

In order to leverage the parallel bulk insert capabilities Thomas constructs a prioritised queue of the jobs that need to be done and then uses multiple copies of a generic package that takes the next item from the queue, flags it as in progress, runs it and finally marks it as complete. this sequence is looped until there are no more jobs in the queue.

By running the load this way Thomas is able to keep development time for the packages to a minimum, and ensure that his server is as busy as it can be right up to the job has finished. The CAT team have a number of video presentations around SSIS performance which include details on producing this design pattern.

Following this I went to hear Ashwani Row talk about SSAS Best Practices.

The big takeaway for me from this session was seeing how normalising the relational datamart tables to a snowflake schema and providing a view over the top of this to present SSAS with a simple star schema improved processing performance without the overheads and complications that would normally be associated with a snowflake in the Cube.

A very handy tip that I'm sure I will use again and again.

My next session was on parallel tuning with Ramesh Meyyappan. I'm not a hardcore performance junkie, but I love listening to people that are. It allows me to get snippets and tips without having to do the hard graft of finding them for myself - and Ramesh is probably my favorite oracle on the subject.

The sample scripts for this presentation are available from SQL Workshops.

Some of the snippets that I came away with are that CTEs will not run parallel, though temporary tables will. Also when querying a table with a computed column the query will not run in parallel if there is a computed column using a scalar function.

It was also interesting to note that if the estimated number of rows is low then the query will not be considered for parallel processing. The estimated rows can bees viewed using the estimated query plan, and rearranging the query to present the high row part of a query first can change the estimated row count. While this sounds almost like the old school coders' myth that the table order in a join affected performance. Ramesh demonstrated this happening, and if you are still in any doubt I recommend that you either download his samples and try for yourself and / or find an opportunity to hear Ramesh speak.

For the lunchtime session I went to hear some myths about SQL Server hosted by Quest Software. No big takeaways from this (unless you count the T-shirt) but a very fun session.


Well, I have a couple of Ideras
  During the lunch break, the guys on the Idera stand 'enhanced' my Viking hat and where kind enough to tweet me the picture.


















Following lunch, I went to hear Jennifer Stirrup in a session Alice's Adventures in Reporting Services.

While Jen did nothing to dispel our image as geeks with her Dr Who enemy example data her session was an excellent piece in trying to ween people off of producing poor performing visualisations - a cause dear to my heart! 

she discussed...
preattentive attributes

   - Those 'things' that allow us to make snap observations such as differences in colour, relative size (bigger = closer) etc. and how these things can be utilised to highlight important perspectives on the information, or when misunderstood, to obscure or distort it.

Visual Integration
   -  This is the phase where the viewer makes the connections in the information that is presented, for example working out which graphical elements relate to which elements in the legend.

followed by the Cognitive Integration phase
   -  This is the bit where the viewer reads the presented information, understanding how the various elements interrelate and interpret the 'message' in your report.

Jen highlighted that people are inherently poor at judging 2D area and angles, also how chart-junk (including 3D effects obscure the information. She drew attention to the bullet graphs and sparklines now available in SSRS 2008 R2 and gave us a short demo on using them.

My Final session of the day was End to End Master Data Management... with Jeremy Kashel

This was a worthwhile introduction to MDS providing an insight into how the web front end is used to manage the process and how business rules are set up and applied to ensure that the consistency essential to the concept of Master Data are handled.

Jeremy also talked about how SSIS is leveraged to extract source data into the MDM repository and also to take clean and consolidated data back to the source systems.

Then it was on the the post event party where I was robbed of any prize for the 5 time running, but had a great time regardless.

Thanks again to all the hardworking guys who give up there time to make this the best SQL Server event in the UK time and time again. Please add your own memories from the event in the comments and remember to fill in your feedback forms!

See you all again in 6 months

Sunday 19 September 2010

SQLBits 7 - the biggest yet

While I didn’t get enough votes to speak at the SQLBits conference at the end of this month (maybe next time) it is still panning out to be the biggest and best ever with over 500 people registered and some fantastic speakers lined up.

I’m looking forward to catching up with some old friends there along with making some new ones. If you see me please drop by and say hello.

John

Bad Pie Chart

Yes, I know it is an tautology, but under a very small number of circumstances a pie chart can almost work. This one that was published by the California Institute for Telecommunications and Information Technology doesn't work on any level.
As you can see, it is basically 3 pie charts nested one within the other to form concentric rings from which the reader is supposed to be able to make comparisons of changing reading habits between 1960 and 2008.

So right from the start we have an inconsistent timeline with the chart showing a 20 year gap followed by an 18 year gap. This has the effect of preventing any meaningful information being conveyed about the rate of change. If figures are available for 2008, a non-decadal year, then it is reasonable to assume that the author would have been able to locate data for years that would have given a consistent timeline and thus left us, the poor reader with some hope of extracting information from the graphic (assuming that we have a protractor to hand, of course).

This brings us on to a second problem, if this chart is intended to display the changing picture of reading patterns in an undisclosed area, would any one like to postulate how the contribution of Radio has changed between 1960 and 2008? How about is it bigger or less in 2008 than it was in 1960?

In fact, I did get a protractor out, in 1960 the Radio sector was 72 degrees and in 2008 it was 38 degrees so according to this chart it has shrunk by almost half.

this highlights the third problem with this pie chart, because the area taken up by a uniform width band will increase the further it is from the centre of a circle, the area occupied by the 1960 Radio segment is very similar to the area occupied by the segment in 2008. Because people are better at judging area than they are angles the reader is fooled into believing that the segments represent similar populations.

So we are able to reliably extract 4 numbers from this chart without spending time with a printout and a protractor are the three figures for print on the selected years and a figure for computer in 2008 because the author has provided numbers for these. and given that there is no entry for computer prior to 2008 we are left to wonder at the growth rate of this medium, from the chart we could be forgiven for thinking that the computer sprung into being in 2008, but certainly after 1980. this would be something of a shock to those of us using Apple IIs and Sinclairs in the 1970s. So three numbers that are related to the time series, wrapped up in a lot of non-informative ink. This should have been a line chart, or if the data is only available in discreet segments a histogram.
  
John

Geographic datatypes to create custom visualisations in SQL 08

I recently stumbled across a blog by Teo Lachev describing how to use the geometric datatypes in SQL Server 08 to produce squarified heat maps in SSRS. (http://prologika.com/CS/blogs/blog/archive/2009/08/30/heat-maps-as-reports.aspx). In a previous role I looked into creating a heat map of this type for Reporting Services 05. To do this would have involved creating a custom component / tool for SSRS to draw the visualisation. We ultimately decided that this represented too much effort to peruse. (Which is why this entry caught my eye).
Much has been written on the subject of the geospatial capabilities of SQL Server but what this blog hints at is that because you can now store and manipulate geometric polygons natively in SQL Server, almost any visualisation is possible! Naturally this also opens a huge number of potential holes and increases the scope to produce truly awful visualisations. With this in mind you should have read and understood at least one of Stephen Few’s books before attempting to produce any graphical representation of information.  J  -  Stephen’s blog and details of his books and courses can be found at http://www.perceptualedge.com/)
For example if we look at the pie chart devised by Florence Nightingale to show the monthly breakdown of causes of death during 1854/55, we can see that this is made up of segments where the number of deaths is measured from the centre of the graph giving a ‘ragged’ pie chart. This is not easy using the native visualisations in SSRS, though it may be possible to coerce a radar chart to produce something similar.
 However each of these ‘segments’ can be drawn using the geometric data types and methods geometry::STPolyFromText(). I won’t labour the over the technical details again here as Teo has already covered them very well in his blog.
As a footnote, I should add that this chart does not represent good practice in terms of visualisation. As with any pie chart it is intended to distort the information in order to emphasise or hide something. In this case because the segments are arcs the outer edge holds more area than the inner. As a result the audience is given a greater impression of the importance of the higher values here and in this case make them more  likely to support Florence Nightingale’s argument that greater effort should be spent on disease prevention.
 A fairer representation of the data would have been a line graph.  

SQL Bits

SQL Bits The 7 Wonders of SQL Server conference registration is now open.

This is the biggest – and in my opinion the best – SQL Server conference in Europe. It runs from Thursday September 30th with a training day through the premium Friday conference to the free Community day on Saturday October 2nd. Register at http://www.sqlbits.com/ and I look forward to seeing you in York.

This will be my 4th SQL Bits and hopefully my first presenting. I have submitted a session on data visualisation best practice ‘Lies, Damn Lies and Pie Charts’ where I will use a selection of good and bad visualisations to talk through why some things work while others don’t. As with any meaningful discussion on this topic I will be referencing the work of Stephen Few (http://www.perceptualedge.com/) and Edward Tufte (http://www.edwardtufte.com/tufte/).

Once you have registered you will have an opportunity to vote for the sessions that you would like to see presented – this does not obligate you to attending your selections, you can decide that on the day. So please vote for Lies, Damn Lies and Pie Charts plus any other 9 sessions you like the sound of at http://www.sqlbits.com/information/PublicSessions.aspx once you have registered.

Thanks

John