Data Visualisation for Social Research and Business Intelligence

Cole Knaflic of Storytelling with Data fame recently published an interesting challenge to makeover a data visualisation showing the frequency of hurricanes, which had recently been published in the Economist. It looks like this:

Economist_hurricane

Normally when I participate in a data visualisation makeover I start by looking at what aspects I like followed by those I think could be improved.  But for this project I wanted to look at how following the ‘Storytelling with Data’ design process, as described in the eponymous book can help make my makeover visualisation more effective at communicating a ‘big idea‘.

There are 6 stages to the Storytelling with Data Process:

Understand the Context:

The original visualisation presents a stacked bar chart, coloured to compare the frequency of Category 1 and 2 intensity hurricanes to that of Category 3, 4 and 5 intensity hurricanes, or ‘Major hurricanes’ over time.  The trends show that the frequency of major hurricanes are on the increase over time compared to ‘All hurricanes’ which are decreasing in frequency.

However, it doesn’t really explain the how the different hurricane categories are derived or what the implication of an increase in major hurricanes will be.

The Storytelling with Data Process advocates 3 phases to setting the context:

  • Who is our audience?
  • What do we wish to communicate to them?
  • How are we going to communicate it?

By answering these 3 questions we can formulate a ‘Big Idea’ with which to frame our story.  This is important to give our visualisation some goals.

I undertook some basic research into how hurricanes are categorised according to the potential level of damage upon hitting landfall according to the Saffir Simpson Scale.

This research helped me identify my big idea:

  • Who: A lay audience with little knowledge of how hurricanes are measured
  • What: Understand how the frequency of major intensity hurricanes compares to minor intensity hurricanes over time
  • How: Show the moving average (over 3 decade periods) frequency of hurricanes by intensity

Choose an appropriate display:

The stacked chart used in the original visualisation is not an appropriate display.   It is easy to compare the higher order categories shown in red as they are placed on the base axis.  However, the lower intensity categories are less easy to compare over time as they do not line up.

A line chart is widely recognised as a more appropriate display for showing the trends over time.  Choosing a line chart also removes unnecessary clutter which the bar chart steps created.

Choosing an appropriate display communicates ideas more effectively

Choose an appropriate display

Eliminate Clutter:

Cole Knaflic points out in her book clutter adds unnecessary complication to the visual design with a call to arms that “clutter is your enemy”!  Clutter creates additional mental effort required on the part of the audience to understand the messages being communicated.  The risk of an overly complicated chart is that the reader will disengage from the visualisation altogether.  The original visualisation is overly cluttered:

  • There are too many colours which is distracting with the overlaid trend lines
  • The grid lines are too thick
  • The axis labels are too big and miss every other decade, which is confusing

To tackle this I removed key elements to see if it made it clearer:

  • Aggregated the lower and higher intensity categories into minor and major intensity hurricanes to reduce the colour palette
  • Removed the trend lines
  • Removed the grid lines altogether as is my personal preference, but you could leave them in and make them fainter
  • Created ‘bins’ of the years to directly label each decade clearly with a smaller font

These amendments, when combined with the adoption of  the line chart format, reduces clutter as there are less visual elements to process.

Eliminating clutter simplifies the visualisation, making it clearer to understand

Eliminate Clutter

Draw attention to where you want it:

The next goal is to focus the attention of our audience to where we want them to look using ‘pre attentive attributes‘.  These are visual elements, which our brains are instinctively programmed to notice.  These include for example: colour, size, shape, orientation.

The issue with the original visualisation is that so much is highlighted or emboldened that nothing really stands out.  Our eyes are drawn to the colour legend, which overly distracts from the data.

A common solution is to grey out the visualisation and highlight one key element. By making the colour of the text for sub titles and axis labels light grey it makes the colours of the chart stand out.

I was keen to preserve the original colours of the low and high intensity categories as I think they work well.  To make them the focus I added direct labelling at the expense of an x-axis.  I colour coded the labels and made them large font, which removes the need for a distracting colour legend.

Use of pre-attentive attributes like colour and size can help focus our audience’s attention

Think like a designer

Think like a designer:

Some of the design elements of the original visualisation do not work very well.  For example:

  • The trend line is for ‘All hurricanes’, but colour coded the same as the lower intensity hurricane categories
  • The red line above the title is distracting

Thinking like a designer includes consideration of:

  • Affordances: It is obvious how to use the visualisation
  • Accessibility: Design which lends itself to being easy for the end user to understand the visualisation e.g. considerate use of text on titles and annotations
  • Aesthetics: A design which is easy on the eye helps with user engagement e.g. use of white space and object alignment as well as smart use of colour

By considering how to incorporate these design principles we can enhance the chance that our visualisation will be accepted by our audience.

I added a dashboard title as a placeholder for the big idea I wanted to convey, as this is prime real estate for grabbing the reader’s attention.  I needed to add an explicit sub title to convey that this is a moving average.

I established a clear visual hierarchy, whereby the audience are invited to start on the left and follow with their eyes to the right.  In that way it is clear what the affordances are in terms of how it is intended to be read.  This is supported by clear axis labels, source icons and notes.

By keeping the chart simple and directly labelling the lines, it makes the data accessible for the user to read the trends.  Consideration of white space, alignment and using colour sparingly helps promote a clean visualisation, which hopefully is more aesthetically pleasing than the original.

Consideration of affordances, accessibility and aesthetics supports acceptance

Tell a story

Tell a story:

The final stage is to package the visualisation in a way that tells the narrative of our big idea.  This includes:

  • Structure: Every story should ideally have a beginning, middle and an end
  • Dramatic tension: Introducing the imbalance created by a problem followed by a solution to restore balance helps grab your audience’s attention
  • Narrative flow: Repetition of ideas, manner of narrative as well as vertical and horizontal logic of flow can be used to support communication of ideas

The original visualisation title of ‘Spin Cycle’ doesn’t make sense and is confusing.  The sub title is nice and clear but lacks a ‘hook’ to engage readers.  There is no commentary on the impact of the hurricanes, so there is a ‘so what?’ reaction from myself.

In my final  version I have changed the angle of the story from simply describing the trends to highlighting the gap narrowing between what I now termed low and high intensity hurricanes as upon reflection, this seemed like the more interesting story.

I added contextual information about the categorisation scales and the implications of the severity of damage.  This adds some tension as the increase in the average number of high intensity hurricanes now has some real implications.  Thinking like a designer, the definitions are added to the end of the trend lines to link them directly to the data.

I also undertook a final round of de-cluttering, including removing date axis labels contrary to the Storytelling with Data Process as is my personal preference.

Atlantic Hurricanes Re-designed

What have I learnt?

By following the Storytelling with Data Process, I feel I have gained valuable insight into why a data visualisation can provide clarity with an audience and therefore why they would want to engage with it.  By following a structured process I have learnt a lot about how to use data visualisation to communicate a big idea.   I also found there were some elements of the process I could adapt to my own style.

However, I found that I was left questioning my design in terms of whether it could employ even more de-cluttering, good design and story telling principles?!  This led to a very iterative process of re-design, which can take time.  But in the end it leads to a more effective visualisation that communicates more clearly an idea than the original design.

Additional Reading on this Makeover Challenge:

Throughout this process I was reading the thoughts of colleagues in the Tableau Community who were undertaking the same challenge.

Leave a comment

The challenge…

The objective of most data visualisation projects is to analyse and explore data in order to identify some interesting trends, which can then be communicated as valuable insight to a wider audience.  The challenge can often be twofold;

  • To identify hidden patterns in complex data.
  • To communicate some easily understandable insight to a wider audience.

A case study…

I recently published a data visualisation as part of week 34 of the weekly social data project; Makeover Monday.  The aim was to improve upon the original data visualisation on the paths of solar eclipses 2001AD to 2020AD published by Moonblink.

Screen Shot 2017-08-31 at 14.00.40

I found the original visualisation interesting as it shows the complexity of solar eclipses pathways across the World.  However I found it very cluttered and therefore difficult to discern the patterns of the types of solar eclipses due to the overlapping of data points.

My goals were:

  • To create an easy to understand visual design.
  • To show every data point.
  • To identify some interesting patterns.
  • To communicate these via a story.

In the dataset provided there was 5000 years of historic and forecast solar eclipse data including a breakdown by type; ‘Total’, ‘Partial’, ‘Hybrid’ and ‘Annular’.  My first task was to undertake some background research in order to understand what the categories mean.

My makeover focussed upon the patterns of total and partial (only) eclipses.  This visualisation was included in the weekly Makeover Monday recap blog by Eva Murray.  I was very pleased with this as I have been refining my approach and taking in the lessons of these very useful recaps over the year.

Screen Shot 2017-08-25 at 14.09.52

This got me to thinking what is it about this particular design which works well compared to other visualisations I have created?

Why does our brain recognise patterns?

The human brain has evolved to understand complex patterns by grouping objects together in certain ways in order to make sense of them.  To understand how this can be leveraged for effective visual design I researched the ‘Gestalt School of Visual Perception’ of 1912, the purpose of which was to uncover ‘how we perceive pattern, form and organisation’ (Few S. 2012; p80).

There is a relevant chapter in Stephen Few’s 2012 book; ‘Show Me the Numbers’.  The following definitions and images are available in Wikipedia.  The following are key examples of the Gestalt Principles.

Principle of Proximity:

Objects or shapes that are close to one another appear to form groups.  That is, even if objects appear different in size, shape or colour they will appear to form groups if they are close to each other.  For example; the dots arranged in columns below are perceived as distinct groups.

712px-Gestalt_proximity.svg

Principle of Similarity:

We tend to perceive objects which physically resemble each other as belonging to the same group and those which differ as belonging to a separate group.  That is, we tend to group objects that are similar in ‘colour, size, shape or orientation’ (Few S, 2012; p81).  For example, coloured dots stand out from non coloured dots as a group.

300px-Gestalt_similarity.svg

Principle of Closure:

We tend to perceive objects as complete, even if a picture is incomplete or partially hidden.  If a shape’s border is incomplete, the human mind will tend to close the border and ignore the gaps.  For example, the shapes below can be perceived as a circle and square even though the border is broken.

528px-Gestalt_closure.svg

Principle of Continuity:

Overlapping objects appear part of a whole if they share a continual direction rather than a series of adjacent objects.  For example, the image below appears as two overlapping objects rather than three separate ones.

CrossKeys

How can the principles of grouping be applied in data visualisation design?

To answer this, I will explore how the Gestalt Principles apply to my solar eclipse makeover.

Latitudes of partial only solar eclipses

I was immediately struck by the closely wound grouping of the partial (only) eclipses  over the polar regions in both the Northern and Southern Hemispheres.  This compares to the wider spread pattern of total eclipses across both Hemispheres.  This demonstrates the effect of proximity between objects to create a distinct pattern.

To enhance the meaning of this pattern I attributed orange to the partial eclipses and teal to the total eclipses in order to enhance the similarity within the groups and difference between the groups.  However I filtered out the other two types of eclipses; ‘Annular’ and ‘Hybrid’ so as not to overload the visualisation with too many patterns of similarity.

The partial solar eclipses are presented as a double helix of coloured dots.  However, visually we can perceive them as a closed band separate from the more widely dispersed total eclipses, which in turn form their own latitudinal patterns.  This is reinforced by the pattern of continuity as both types of solar eclipses appear to follow a distinctive 5000 year cycle.

The text boxes can also be perceived as groups of continuous text due to the left alignment.  This approach helps break up the story and therefore make it easier to read.

What are the benefits of this approach?

By understanding the way in which the human brain groups objects together visually, then we can design better visualisations which emphasise differences more effectively.  We can concentrate our readers’ focus to where there are patterns within our data in terms of proximity, similarity, closure and continuity of objects.  For example; we can enhance pattern recognition through using colour, shape, size and direction of objects.  This leads to a quicker and more direct journey from data to insight.

What are the challenges of this approach?

Not all visualisations will lend themselves so neatly to pattens which follow the Principles of Grouping as nature so often does.  Sometimes data visualisation can throw up messy distributions with no seemingly obvious pattern.  It is up to the visualiser to choose between a complex design which shows every data point or a simplified view which categorises the data (i.e. a bar chart).  There is always a trade off between simplicity and complexity in terms of enlightening and engaging an audience.

The other challenge is that once a pattern is identified the visualiser’s task is to make sense of it to the reader.  In this case I spent a lot of time researching solar eclipse types in order to understand what caused them and focussed on how to communicate this to a lay audience succinctly.  For example, describing those solar eclipses which only occur as partial compared to the more widespread partial solar eclipses which accompany total eclipses was a conceptual challenge.

The wider context…

By understanding the Principles of Grouping we can gain a greater understanding of why our visualisations are effective.  We can then design better visualisations which aim to leverage colour, shape, size and direction of objects to help identify and communicate complex and hidden patterns.  This can lead to quicker and more direct insight for decision makers.  It can also help data visualisers to gain a better understanding of why their designs are more or less effective.

Leave a comment

The challenge for week 32 of Makeovermonday was to improve upon a visualisation from Business Standard, which looks at the percentage of schools across India which have usable toilets.  As someone who has studied human geography I found this an interesting and worthwhile topic to explore.

In this blog I will explore what was wrong with the original visualisation and a few pointers on how to maximise the impact of the makeover.  This is influenced in no small way by the Story Telling with Data process by Cole Knaflic.

The original data visualisation looks like this:

Screen Shot 2017-08-06 at 14.33.28

What’s good about it?

  • The colours are clear in terms of showing which states have a higher or lower proportion of schools with usable toilets.

What could be improved?

  • There is a lack of context – what is the story behind the data?
  • It could use a more appropriate chart type – maps can distort magnitude as larger areas are more prominent.  They are also not appropriate for showing change effectively over time.
  • The design choices – the labelling on the maps makes the view cluttered.
  • The clarity of the visualisation – it is not clear what the titles refer to i.e. is it the percentage of toilets that are usable or schools with usable toilets? In fact it is the latter.

My approach:

So let’s take each of those issues in order and try to tackle them.

Add some context…to tell a story

I started researching some context about the issue and why it is important.  I read the article behind the original visualisation and made some notes from which I could pull out a potential story board.

What grabbed my attention was that despite improvements in the proportion of schools with accessible toilets and usable toilets, that the rate of improvement was slowing over recent years.  There was also a disparity in terms of reporting on accessible versus usable toilets which missed underlying issues of sanitation in schools.  This is important as it has a big impact upon student wellbeing.

From this I could pose some important questions in terms of;

  • What is the issue?
  • Why does it matter?
  • What needs to change?

These could be used to wrap around the chart and tell an effective story.

Use an appropriate chart type…

I used a slope chart to effectively show the change in the percentage of schools with usable toilets between 2014 and 2016.  These are one of my favourite chart types and great for showing change in percentage measures.  To build this in Tableau I used a simple method on adding vertical lines to slope graphs multiple measures from Tableau Zen Master Matt Chambers.

Use design to clean up the viz…

  • I used colour and labelling to highlight only those states which had decreased in terms of the proportion of usable toilets.
  • I used colour in the title so that a separate legend is unnecessary.
  • I leveraged white space to create borders between the text and chart.

Add some clarity…

  • I posed one simple business question in the title and aimed to use the data visualisation to answer it.
  • I placed annotation near to the data to answer the question I had posed in the title.

Benefits of this approach:

  • Some thought provoking context makes the reader think and adds a call to action.
  • An appropriate chart type shows change effectively over time.
  • Effective design choices can provide a clean visualisation without unnecessary clutter, which is easier to read.
  • A simple clear business question answered can help make it clearer what the visualisation is about.

Challenges of this approach:

  • The labelling of the slope chart is tricky because the names are clustered.  This could be reduced by only showing the percentage figure on the right side.
  • I used a floating design method in Tableau, which provided opportunity to space out the chart elements, but requires visual symmetry in order to align the various components.

In summary:

To recap, I used context to tell a meaningful data story based upon an appropriate chart type which demonstrated change effectively.  I then used design choices such as leveraging of white space to break up the view as well as minimal use of colour to highlight key trends.  Finally I added some clarity in terms of using the chart to answer the business question posed in the title so it is clear what the visualisation is about.  

Hopefully these tips are helpful for helping you to maximise the impact of your data visualisation.

The final visualisation:

Indian Schools 2014 to 2016

The interactive version can be found on Tableau Public.

 

 

1 Comment

I recently re-tweeted my very first data visualisation, which I uploaded to Tableau Public 12 months ago.

Screen Shot 2017-07-21 at 11.00.42

Mark Edwards, Sarah Bartlett and Ken Flerlage were kind enough to ask that I write my experiences down in this blog.  This is a recap of that last year, including the highs, the lows and some key lessons I’ve learnt from visualising data in Tableau Desktop.

Some context

For over 18 years I had worked my way up through a challenging but rewarding career in local government research.  However after many years doing more or less the same job in the same organisation I had a growing urge to try something new.  I had long held the idea of being a freelance consultant in my mind.

I knew I needed to do something which would enable me to transfer my skills and experience of researching and analysing data.  Further research into ‘data visualisation‘ led me to an award winning website called Visualising Data  by Andy Kirk.  I downloaded his first e-book ‘Data Visualisation: A Successful Design Process’ and found this quote right at the start; “welcome to the art of data visualisation – a multi-disciplinary recipe of art, science, math, technology and many other interesting ingredients”.  This really resonated with me as it mentioned several disciplines I was interested in.  Like many, I had been visualising data in one form or another for years but never thought of it as a discipline in its own right.

Getting the right tools for the job

Next, I researched data visualisation tools and there were a few different products on the market.  In 2015, I heard a freelance visual journalist called Caroline Beavon not only deliver a great presentation on data visualisation but also mention some free visualisation software that was easy to use called ‘Tableau‘.  Fast forward a year and I saw an inspiring presentation from Tableau Zen Master Rob Radburn about the opportunities and challenges of using Tableau in Leicestershire County Council.  Both these presentations opened my eyes not only to the potential for data visualisation but also that there was a great tool out there to help me on my new career path.

I downloaded a trial copy of Tableau Desktop in December 2016.  I was really impressed with how easy it was to create a few simple charts from my wife Ann’s fitbit data.  What also impressed me was the potential versatility of the product; as well as being able to drag and drop data to create charts I could also map data and undertake statistical analysis.  This made it great value for money for an independent as I wouldn’t need to invest in numerous stand alone systems.

Learning the craft

My learning has been split into two distinct learning routes; formal training and informal practice.  Formally I have attended classroom sessions for Tableau; ‘Desktop Fundamentals’, ‘Desktop II (Intermediate)’ and ‘Desktop III (Advanced)’.  For the latter two modules I have attended as part of a condensed Conference edition course.  These are intensive days with a huge amount of information to take on board.  I soon learnt the wisest approach was to keep up with the trainer as best I could and then go back to the manuals and exercises in my own time.

In October 2016 I attended a great 2 day data visualisation primer run by Andy Kirk in London.  This gave me a great overall insight into the data visualisation design process as well as good practice.

I have also read a few useful books in order to develop my knowledge of data visualisation best practice.  I am quite a theoretical learner so I like to take models and processes and try to make them own.

Few and McCandless jostling for position on my bookshelf…

IMG_1173

  • Formal training gave me a solid foundation of theory to build upon.

There are also many great open resources in the Tableau Community to learn from.  I have benefited a lot from Tableau Tip Tuesday authored by Andy Kriebel. This is a range of how to videos which often pop up when you google ‘how to build…in Tableau‘.

Another great resource is the Learning Tableau Blog by Charlie Hutcheson; where each week he writes about some of the technical challenges he has overcome in Tableau, often through ‘reverse engineering’ viz as part of  his ‘Take Apart Tuesday’ series.  Of which, downloading other people’s workbooks from Tableau Public is one of the best ways to learn Tableau.

However I knew that the very best way to learn would be to practice, practice and practice some more!  I needed a way to start using Tableau regularly so I would have developed some skills by the time I started using it professionally.

  • Practice helped me develop theory into tacit knowledge

Getting involved in data projects

Following the first Tableau Conference session I decided to take part in a project that kept being mentioned, called ‘Makeovermonday‘ (MOM).  For those of you few who are unaware, it is a weekly social data project originally run by Andy Kriebel and Andy Cotgreave in 2016 and then in 2017 by Andy and Eva Murray.  The aim is to take a published data visualisation and ‘make it over’ in order to improve the original charts.  The makeovers are published to Tableau Public and Twitter for a wider audience.

You may notice that the data visualisation at the start of this blog is ‘my first makeovermonday‘ as well as my first Tableau Public data viz!  Little did I know how influential this project would be in my ongoing development.

  • Firstly it has given me an opportunity to develop my Tableau skills through practical application.
  • Secondly it has developed a fantastic social network through following conversations on Twitter under the hashtag #Makeovermonday.
  • Thirdly it has allowed me the opportunity to develop a portfolio of over 50 visualisations I can share wherever I go.

It has been challenging at times though, particularly in 2016, when I was juggling many life events; ending one career and starting another as well as finishing a post graduate qualification.  Something had to give and for a few weeks it was MOM.  But I persevered and started a fresh in 2017 and have since completed most of the challenges.  I tend to use each challenge as an opportunity to learn and develop a new approach or skill, which can take time.

Another great project worth a mention is Viz For Social Good run by Chloe Tseng.  I took part in the Hip Hop Gardens Project, which combined several things I am interested in; hip hop music, social regeneration, community gardening with data visualisation!

It was a seminal project in my design development as I tried a more long form story board.  If you would like to know more, please read my blog about the project.

Hip hop and community gardening visualised…

Hip Hop Gardens

  • Getting involved in projects has helped me develop my Tableau skills, grow a network and develop a portfolio

Go where there are like minded people

In 2016 and 2017 I attended the Tableau Conference on Tour; both times were a fantastic experience.  In 2016 I was over whelmed by the range of content.  I also did not know anybody and struggled to socialise but tried my best.  By 2017, thanks to getting to know people from social media and participating in projects I knew a great bunch of like minded people.  This made it a much more rewarding experience.

A sunny Day 3 at Tobacco Dock…#Data2017

IMG_1020

I have also started attending the London Tableau User Group.  The organisers include Pablo Gomez and Sarah Bartlett, who in particular have been really welcoming as have the others involved.

Although its a bit of a trek from the Northwest, I have found it worth it as the quality of speakers has been fantastic.  For example I got to see Cole Knaflic talk about her Story Telling with Data Process of which I am a big fan.

Cole Knaflic talking at London TUG…

IMG_0867

  • Its more fun to talk about Tableau to other people, especially if you are working on your own.

Another great place to discuss data visualisation and Tableau is Twitter.  This has been invaluable for someone like me vizzing on my own, save for my cat Killi.  There are too many people to mention, but of those not already mentioned, who has been really helpful is Neil Richards.  If you haven’t checked out Neil’s innovative and experimental viz then I suggest you do!

Killi checks out some trend analysis of clinical prescription data…

IMG_0856

Finding your style

I am still developing my data visualisation style with practice.  I am also looking for who my audience is both professionally and personally.

I have also learnt to try to take criticism constructively and develop from it.  This isn’t always an easy thing to do, but is worth the reward.  Most people in the Tableau Community want to improve the quality of data visualisation and I’ve found they are more than willing to help with advice and guidance.

What I have found out works well:

  • Try new things
  • Ask for feedback
  • Learn what works and what doesn’t
  • Improve over time

So my advice to Tableau newbies is:

  • Formal training is a good place to start
  • Get involved in a practical project like #Makeovermonday
  • Practice and learn continually to improve
  • Attend a conference or a TUG
  • Engage the community on Twitter

Looking back…

It has been a great 12 months of learning new craft, discovering a great new tool and meeting lots of cool new people.  Its been exciting, hard work and scary all at once.

Looking forward my goals are:

  • Continue to develop my technical skills
  • Continue to develop my design process and style
  • Achieve Tableau Associate Certification to enhance my professional standing
  • Develop this blog to showcase the benefits of data visualisation to wider audiences

I would also like to find time to run some more of my own data visualisation projects.  I have interests in the world of social regeneration, which I specialised in for my research career and would like to do some more work in that field.

Thanks to…

All of the people mentioned in this blog post and others in the data visualisation and Tableau Community.

I would also like to thank my data visualisation checker supreme, my wife Ann without whose support I would never had been able to make this journey.

2 Comments

Introduction:

Each week I take part in a data visualisation challenge called ‘Makeovermonday’. The idea is to take a data visualisation that has already been published and make it over using good practice techniques. I use an industry leading data visualisation tool called ‘Tableau’.  Week 25 presented another Big Data challenge using 202 million records measuring air quality in the USA over time, powered by Exasol’s super fast database.  The dataset related to levels of ozone measured hourly and daily across US counties and states over several years and the impact upon public health.

In this blog I will tell you my approach to my makeover.  Then I will explore the chart type I used called a ‘Box and Whisker‘ chart, making a case for and against it based upon good practice theory compared to practical experience.

The original visualisation: 

Screen Shot 2017-06-23 at 13.53.57.png

Screen Shot 2017-06-18 at 15.00.19What did I like?

  • The colour legend shows which days are healthy or unhealthy throughout each year
  • Clear title and source telling me what the visualisation is showing
  • Interactivity allows me to drill down through geography and time

What could be improved?

  • There is a lack of context about what ozone is and the health concerns
  • It could be clearer in terms of showing magnitude of changes over time
  • There could be a story board approach to engage the user and help them navigate the trends

My approach:

  • To show some more contextual information about a) Ozone levels and b) how they are measured through the Air Quality Index
  • To show the size of trends over time more effectively
  • Use colour to highlight healthy versus un-healthy days as in the original
  • Visualise individual data points daily but drill down to the hourly level
  • Tell an interesting story!
  • I picked New York County in 2015 in order to filter down the data.  I chose to look within a year rather than across years.

Visualising via a Box and Whisker Chart:

  • I tried a new chart type I had never tried called a ‘Box and Whisker’ chart

Box and Whisker

The chart is a simplified representation of a distribution of data.

  • The box represents the range between the 1st and 3rd quartiles of data (Interquartile range).
  • The middle line represents the median (mid point) value.
  • The whiskers represent the outliers of the data points (at 1.5 the Interquartile range).
  • Half the data points are located within the box, the other half between the box and the upper and lower whiskers.

For a great introduction to this chart type, I referred to Alberto Cairo’s excellent book on data visualisation for communication;The Truthful Art (2016, p192).

I included two box and whisker charts; one looking at each day in each month of 2015 and another breaking it down further to each hour in each day of July 2015; the month with the highest ozone levels.

NYC Ozone Dashboard

There is a great discussion around when to use and when not to use Box and Whisker charts in The Big Book of Dashboards (BBOD) (2017, p61) by Andy Cotgreave, Steve Wexler and Jeffery Shaffer.  I will now compare the case for and against presented in the ‘BBOD’ against my own practical experience.

The case for Box and Whisker charts:

  • The box and whisker chart shows all the data points; whether there are 20, 2000 or 2 million.  It structures the data into boundaries of equal size.  As such it is a good chart for showing the distribution of the data.
  • It is also good for comparing distributions across categories such as dates in this case. The box and whiskers can be easily compared against one another to see how the medians and the ranges compare.
  • It is also effective for identifying outliers above or below the average.

In the practical exercise, the dot plots clearly show the increase in ground level ozone in New York County in the Summer months of 2015.  The whiskers are effective for showing that May and June have a greater variance in ozone levels.  The middle lines show that it is July, which has the highest median value.

In my visualisation the daily averages showed that for New York County in 2015 there were no days where average ozone levels were not ‘good’ in terms of impact upon health.  The second box plot looked at hourly distributions across individual days in July 2015 and highlights outliers which were hitting ‘un-healthy’ levels of ozone.  This insight was not apparent when just looking at daily averages.

The case against Box and Whisker charts:

Andy’s co-author Steve Wexler points out that they are less good for identifying individual data points as they overlap.  So if that is the goal then this may not be the right chart type.  In the book there are examples of charts where data points have been ‘jittered’ so every point is visible.

However in this case it was not necessary to see every data point, rather to identify general patterns of when the ozone levels had become unhealthy.  This was achieved by colour coding the data points.

The chart is not the easiest to interpret for an un-trained eye.  An aggregated view like a bar chart would be easier for a lay person to interpret. For example this is a comparison of the daily ozone levels for July 2015 in New York County using a bar chart compared to a box and whisker chart:

US Air Quality Aggregated

I agree that if we compare the aggregated bar chart against the disaggregated box and whisker plot, the former is easier to understand at a glance.  The boxes and whiskers, whilst adding more insight also add more clutter to the visualisation.

Although as Andy Cotgreave states in the BBOD (2017, p61) that “as with all charts, people can be trained to use them”.  Box and whisker charts may seem intimidating at first, but once you know what to look for I think they become easier to use.  However, adding more contextual information to help train users does present some design challenges in terms of not over complicating the view.  As such I included a logo with a tooltip on how to use the chart.

It is important to identify the audience in mind, and their ability to interpret a more complex chart type (Cotgreave et al, 2017, p55).  I designed the visualisation with a generalist audience but with a keen interest to take the time to read the visualisation e.g. an environmental campaigner.

It is very subjective in terms of how visually appealing Box and Whisker charts are and I know some people don’t like them. Well beauty is in the eye of the beholder as they say.  As Andy also says ‘it depends’ on the context or the audience.  I think they are visually appealing for an informed audience interested in digging a bit deeper into the data distribution.

Conclusions:

A complex subject based upon a large dataset can be visualised using either simple aggregate level bar charts or more complex disaggregated charts such as the Box and Whisker Chart.  Which approach to take depends upon who the audience is and the aim of the visualisation.

Box and Whisker charts are suitable for comparing distributions, showing outliers and drilling down into more detail.  In the practical exercise the chart allowed us to view seasonal patterns of ozone levels, compare ranges and identify the months with the highest medians.

This chart is less suitable if we want to compare individual data points as they overlap.  However this is less important as the boxes summarise the data distribution and allow general comparisons to be made.

The chart is less accessible than the bar chart view when looking at hourly emissions by day.  However the aggregated view misses the detail of insights which the colour coded dots give us.  The key learning point is that we often rely upon averages which hide underlying patterns.

Additionally with training supported by guidance notes the user can soon learn how best to use this chart type.  Hopefully, in future it then becomes easier to use.  However, this can present design challenges in terms of additional contextual information.

In terms of whether the Box and Whisker chart is attractive or not then I will leave that up to you to decide. I personally think they have their own aesthetic qualities.

Leave a comment

Introduction

I recently took part in a really worthwhile data visualisation project called ‘May Project Gardens’; featured under the #VizForSocialGood Programme run by Chloe Seng from the Tableau Community.  I was excited to take part in this project for several reasons as it combined several of my favourite things with data visualisation.  These included:

  • Social regeneration; I have worked in the field of social research to support community regeneration for over 18 years and recently graduated with a post graduate diploma in regeneration practice from the University of Chester.  I am passionate about projects which can bring communities together to enhance social capital.
  • Community gardening; I am a joint plot holder at a local allotment. My wife Ann is the project manager but I help out digging, weeding, planting and eating the fresh produce we grow!
  • Hip hop; I used DJ with vinyl, playing soul, funk, hip hop and reggae around my adopted city of Chester in the Northwest of England for 10 years.  So the topic piqued my interest as an innovative way to engage young people with gardening.

The Project Brief:

“May Project Gardens is an award-winning social enterprise – a highly skilled and passionate team working to empower and educate urban communities to live sustainably. 

Hip Hop Garden: an innovative, alternative education model using hip hop to educate and empower marginalised young people to live healthily, learn entrepreneurial skills, and grow their communities.

We now need to raise further money to run this award-winning course in the ninth most deprived ward in the UK – allowing these young people to take their skills to a new level, improve community cohesion and build their self-esteem. To do this, we need to present our data in a visually appealing way to donors and funders.

Communication goals:

  • To show the success of our Hip Hop Garden workshops and pilot programme in terms of youth engagement, learning and personal development”. 

My design process:

The first stage of any data visualisation is always to understand who the audience is and how they are likely to interact with the final product.

I spent a lot of time researching May Project Gardens to understand their vision and values.  In the Hip Hop Garden section of their website there is a really interesting video outlining what the project is about, its aims and objectives as well as some of the challenges they face in terms of funding.

Context;

The research was useful as it helped me understand how the data visualisation could meet the project brief of showing the success of the project workshops in terms of engaging young people.  There was a need to show the community issues which the project was attempting to address, being located in one of the most deprived areas in the country.  The project also needed to demonstrate the outputs generated in terms of the numbers of young people engaged and their satisfaction levels.  Most importantly I was keen to evidence the personal and social learning outcomes achieved by the students through participation.

I also learnt about some of the logistical challenges required to run a community garden from speaking to Ann who is currently studying them as part of a horticultural course.  I also realised that due to funding cuts to youth provision across London that a strong call to action to attract funding would be helpful at the end of the visualisation.

Screen Shot 2017-04-27 at 18.32.50

Data preparation;

There was a series of 10 workshops which had been run at Hip Hop Gardens in 2015 and 2016.  Participants had been given evaluation questionnaires and asked to state whether they enjoyed the events, what they learnt and how it had helped with their personal and social transformation.  The feedback was quite often verbatim, which posed challenges in terms of consistent analysis.

I spent quite a bit of time cleaning the data in Excel, to ensure consistency across each event before inputting into Tableau; my data visualisation package of choice.  Once is Tableau, there was some further work aggregating some of the verbatim feedback to make it consistent in order to measure generic outcomes such as ‘enjoyment’ for example.  This process, whilst time consuming did help me understand the nature of the data variables I was going to be visualising.

Design Choices:

I decided that a long form design would be useful to allow for an exhibition of key performance metrics combined with explanatory text to tell the story of the project.  I was keen to use pictures and quotations to portray a more emotive tone as this is a project about improving people’s lives rather than just a set of numbers. My aim was to attract a potential donor to find out more about the project and hopefully be interested enough to contribute towards funding it.

I did toy with the idea of using an image of a garden design and labelling different plant beds with data from the project.  This led me to imagine creating my own bespoke garden design from the different elements of the data visualisation.

I was keen to plan out a structure which would follow the flow of a project evaluation.  I started my career evaluating community regeneration projects and knew that this followed a ‘logic model’ approach of inputs leading to outputs and most importantly outcomes.  I wanted to follow a story board format with a beginning, middle and an end.

The beginning: Inputs

A bold title with the May Project Gardens logo, a brief definition of the project and a wide angle picture of people working in the garden to set the scene.  A brief introduction to say what the problems are in the local community, which the project is attempting to solve;

  • A recognised disconnect between local young people and the food they eat.
  • A reliance on local food banks as a sign of food poverty.
  • Addressing wider issues of poverty, disempowerment and access to resources.

To set the context I outlined the project inputs across three boxes;

  • What we do; hip hop to empower young people to take control of their health and empower their communities.
  • The aims of the project; engagement, education, empowerment.
  • How; the project delivers educational programmes.

The middle: Outputs

This section allowed me to be creative and have some fun!  I used different chart types in Tableau to display the project performance metrics whilst simultaneously re-creating a hip hop garden of my own.

There is a bar chart which represents the number of participants at each event.  I used a colour legend to show the number of people who enjoyed the events, compared to those who either did not reply or were not sure.  My consistent colour scheme was that green equated to positive feelings about the project.  Each bar represented a line of vegetables. A brown background was used to represent a soil bed. Grey was used to show some of the community problems the project is trying to tackle as well as the call to funding.

There was a challenge in that one of the events in July 2015 used a different evaluation questionnaire than the other events.  I showed the number of participants who recommended this event as a ‘tree-map’ which represented a lawn.  The ‘Marcus Lipton’ pilot project had a unique rating score, which I decided to display as a big number, which could be a bush or a potted plant.

When I showed this to Ann, her first reaction was; it’s good but where does it show what gardening activity takes place?  This was a fair point.  I had already seen some other visualisations from people like Michael Mixon; whose excellent submission for #VizForSocialGood used a dashboard action to show some of the feedback from each event.  I emulated this approach so that if you hover over one of the event bars the definition of the activity is displayed in a box underneath.  For example;

“Challenge 06_07_15: 15-17 yr olds; half day workshop; tour of the garden, garden activities, creating a rap/rhyme.”

A last minute design decision was to include a simple ‘word cloud’ to show the different opinions of participants of the project. Whilst this did add some additional detail to the visualisation, I thought it was worth including as it showed the fun that young people have had participating in the project.

The end: Outcomes

I followed the same three box approach which runs throughout the design to ensure a vertical symmetry.  I used another bar chart to show each individual displayed as a cabbage to show the connection between people and the food they produce.  Both Michael Mixon and Pooja Ghandhi advised me over Twitter how to import and edit images into Tableau.  For this visual I used Powerpoint to remove the white background around the cabbage.  This time a dashboard hover action displays the personal and social transformation outcomes.  I also used images taken from the project which display food, hip hop and community, made them transparent in Powerpoint and floated them in the background of the text.

Finally…

I was keen to include a call to action at the end so that people could donate to the project.  I imported a hover icon image as a png file and edited the url to point to the Hip Hop Gardens project page.

I also spent a long time double checking all the inter active elements worked correctly before I published it to Tableau Public.

What worked well?

I really enjoyed preparing and analysing some raw data and building a visualisation from scratch.  It educated me about a community garden project which was fulfilling.  I learnt about how to construct more detailed long form visualisations.  I also learnt about how to float transparent images in the background, which was a new technique for me.  The time I spent researching and story boarding at the beginning really helped me to structure my design to tell a story and answer the project brief.

What worked less well?

As with any survey data, the inconsistency in the event questionnaires posed some challenges.   In particular the data preparation took a while as it involved coding qualitative feedback.  Some of the vertical line dividers used to separate the sections of the visualisation made formatting the design layout tricky until I discovered that temporarily turning off ‘fixed height’ made it a lot easier to adjust the height ratios.  Additionally, I had undertaken the project over an intensive 3 day period, which helped with my creative flow but left me exhausted by the end.

To conclude:

This was an excellent project to take part in as it combined several of my favourite disciplines; social research, community regeneration with my personal interests in hip hop music and gardening.  I enjoyed being creative and thinking like a designer to tell a better data story.

However, this was an intensive process with some technical challenges.  It has made me consider that having access to a specialist data preparation tool would be useful.  In future, I would also spread the work for a project of this scale over a longer time period.

Overall I am really pleased that my design was chosen to be one of nine international data visualisations for the May Project Gardens #VizForSocialGood gallery.   I hope the visualisation helps the project to raise awareness for their excellent work as well as attract some much needed funding.

Hip Hop Gardens.png

1 Comment

Each week I take part in a data visualisation challenge called ‘Makeovermonday’. The idea is to take a data visualisation that has already been published and make it over using good practice techniques. I use an industry leading data visualisation tool called ‘Tableau’.

The original data visualisation is a simple table showing Dutch export car sales in 2015:

Screen Shot 2017-05-10 at 08.10.54

What do I like about it?

  • It shows the top 10 car brands, model and volume in a nice sorted table.
  • It is simple and easy to understand.

What don’t I like about it?

  • The title could be more insightful.
  • Some car brands are missing.
  • There is a wealth of more information available in the dataset; why just exports and 2015 only?
  • The lack of context; how are export sales changing over time by brand?

My approach to the makeover:

The first thing I always do is to imagine who the audience for my data visualisation could be.  In this case I imagined a car salesperson, who would be interested in sales trends by brand.  I started asking some key questions to help formulate the angle and framing of my intended design:

  • Which are the top 5 car brands in terms of sales?
  • How have brand preferences changed over time?
  • In which months are car sales more or less popular by brand?

I needed a specific focus to look at in detail.  I did some research and came across the following NL Times article relating to the Volkswagen (VW) apologising to its customers after the global ‘Diesel Testing Scandal’ in October 2015.  This was interesting to me as I wanted to explore whether this had impacted upon VW car sales in the Netherlands.  I am also a big VW fan as my first ever car was a classic Mk 2 Golf!

My step by step design process:

Introducing the context;

  • I wanted to answer the questions I had set myself in a brief but insightful title.
  • I used the same colours and font (‘futura’) found on my local VW dealer website and a publicly available version of the famous VW logo.
  • I added some context to give my reader some background information as well as pose a question, which I would aim to answer.

Screen Shot 2017-05-16 at 14.30.57

The analysis;

  • Then I used some brief analysis based upon the charts that follow aim to set some more context and answer the research question posed in the introduction.
  • Simple bar charts were used to show that VW is the top brand for the last 5 years.  I deliberately colour coded VW in its corporate blue and the other brands in shades of grey so they would fade in to the background.
  • Monthly sales are presented as spark lines matching the same colour legend as the bar charts.  The VW sales line is deliberately thicker, so that it visually stands out compared to the other brands.  The maximum and minimum sales months are labelled using a ‘Max/Min Window’ calculation.  The axis are independently scales with no zero to exaggerate the differences between brands.
  • I did experiment with using a ‘Level of Detail’ calculation to differentiate the colours of the minimum and maximum sales values.  This is something I wish to revisit in a future makeover as it is a valuable but more complex process.
  • A text box is used as a subtle dividing line to break the visualisation into sections.

Screen Shot 2017-05-16 at 13.31.38

  • Colour coded trend lines show how the top 5 brands perform over the last 5 years.
  • A reference line is set for 2015, Q4 to indicate the date of the VW apology for the diesel testing scandal.
  • A ‘table calculation’ is used to show the percentage change in sales since the apology was published.
  • An area annotation indicates that sales for VW have increased, but not to the same extent as most of the other brands.
  • Data source, image copyright and design tags all listed as per standard.

Screen Shot 2017-05-16 at 14.46.24

Publishing the visualisation;

  • Because there are over 9 million sales records in the dataset, then in order to publish it efficiently I needed to aggregate the dataset to only those variables I had used.  To do this I followed this simple tutorial video courtesy of Andy Kriebel’s excellent ‘#TableauTipTuesday’ blog series.
  • Finally I published the data visualisation to Tableau Public and Twitter.  This is what the final visualisation looks like:

Dutch Car Sales v2

To conclude:

The analysis shows that VW are still the most popular car brand in the Netherlands, despite the diesel testing scandal.    However, whilst sales have increased since the public apology was published, they have not risen as much as most of the other competitor brands.

Leave a comment