Blog

How to maximise the impact of your data visualisation

The challenge for week 32 of Makeovermonday was to improve upon a visualisation from Business Standard, which looks at the percentage of schools across India which have usable toilets.  As someone who has studied human geography I found this an interesting and worthwhile topic to explore.

In this blog I will explore what was wrong with the original visualisation and a few pointers on how to maximise the impact of the makeover.  This is influenced in no small way by the Story Telling with Data process by Cole Knaflic.

The original data visualisation looks like this:

Screen Shot 2017-08-06 at 14.33.28

What’s good about it?

  • The colours are clear in terms of showing which states have a higher or lower proportion of schools with usable toilets.

What could be improved?

  • There is a lack of context – what is the story behind the data?
  • It could use a more appropriate chart type – maps can distort magnitude as larger areas are more prominent.  They are also not appropriate for showing change effectively over time.
  • The design choices – the labelling on the maps makes the view cluttered.
  • The clarity of the visualisation – it is not clear what the titles refer to i.e. is it the percentage of toilets that are usable or schools with usable toilets? In fact it is the latter.

My approach:

So let’s take each of those issues in order and try to tackle them.

Add some context…to tell a story

I started researching some context about the issue and why it is important.  I read the article behind the original visualisation and made some notes from which I could pull out a potential story board.

What grabbed my attention was that despite improvements in the proportion of schools with accessible toilets and usable toilets, that the rate of improvement was slowing over recent years.  There was also a disparity in terms of reporting on accessible versus usable toilets which missed underlying issues of sanitation in schools.  This is important as it has a big impact upon student wellbeing.

From this I could pose some important questions in terms of;

  • What is the issue?
  • Why does it matter?
  • What needs to change?

These could be used to wrap around the chart and tell an effective story.

Use an appropriate chart type…

I used a slope chart to effectively show the change in the percentage of schools with usable toilets between 2014 and 2016.  These are one of my favourite chart types and great for showing change in percentage measures.  To build this in Tableau I used a simple method on adding vertical lines to slope graphs multiple measures from Tableau Zen Master Matt Chambers.

Use design to clean up the viz…

  • I used colour and labelling to highlight only those states which had decreased in terms of the proportion of usable toilets.
  • I used colour in the title so that a separate legend is unnecessary.
  • I leveraged white space to create borders between the text and chart.

Add some clarity…

  • I posed one simple business question in the title and aimed to use the data visualisation to answer it.
  • I placed annotation near to the data to answer the question I had posed in the title.

Benefits of this approach:

  • Some thought provoking context makes the reader think and adds a call to action.
  • An appropriate chart type shows change effectively over time.
  • Effective design choices can provide a clean visualisation without unnecessary clutter, which is easier to read.
  • A simple clear business question answered can help make it clearer what the visualisation is about.

Challenges of this approach:

  • The labelling of the slope chart is tricky because the names are clustered.  This could be reduced by only showing the percentage figure on the right side.
  • I used a floating design method in Tableau, which provided opportunity to space out the chart elements, but requires visual symmetry in order to align the various components.

In summary:

To recap, I used context to tell a meaningful data story based upon an appropriate chart type which demonstrated change effectively.  I then used design choices such as leveraging of white space to break up the view as well as minimal use of colour to highlight key trends.  Finally I added some clarity in terms of using the chart to answer the business question posed in the title so it is clear what the visualisation is about.  

Hopefully these tips are helpful for helping you to maximise the impact of your data visualisation.

The final visualisation:

Indian Schools 2014 to 2016

The interactive version can be found on Tableau Public.

 

 

Year 1 of Tableau – my journey into data visualisation

I recently re-tweeted my very first data visualisation, which I uploaded to Tableau Public 12 months ago.

Screen Shot 2017-07-21 at 11.00.42

Mark Edwards, Sarah Bartlett and Ken Flerlage were kind enough to ask that I write my experiences down in this blog.  This is a recap of that last year, including the highs, the lows and some key lessons I’ve learnt from visualising data in Tableau Desktop.

Some context

For over 18 years I had worked my way up through a challenging but rewarding career in local government research.  However after many years doing more or less the same job in the same organisation I had a growing urge to try something new.  I had long held the idea of being a freelance consultant in my mind.

I knew I needed to do something which would enable me to transfer my skills and experience of researching and analysing data.  Further research into ‘data visualisation‘ led me to an award winning website called Visualising Data  by Andy Kirk.  I downloaded his first e-book ‘Data Visualisation: A Successful Design Process’ and found this quote right at the start; “welcome to the art of data visualisation – a multi-disciplinary recipe of art, science, math, technology and many other interesting ingredients”.  This really resonated with me as it mentioned several disciplines I was interested in.  Like many, I had been visualising data in one form or another for years but never thought of it as a discipline in its own right.

Getting the right tools for the job

Next, I researched data visualisation tools and there were a few different products on the market.  In 2015, I heard a freelance visual journalist called Caroline Beavon not only deliver a great presentation on data visualisation but also mention some free visualisation software that was easy to use called ‘Tableau‘.  Fast forward a year and I saw an inspiring presentation from Tableau Zen Master Rob Radburn about the opportunities and challenges of using Tableau in Leicestershire County Council.  Both these presentations opened my eyes not only to the potential for data visualisation but also that there was a great tool out there to help me on my new career path.

I downloaded a trial copy of Tableau Desktop in December 2016.  I was really impressed with how easy it was to create a few simple charts from my wife Ann’s fitbit data.  What also impressed me was the potential versatility of the product; as well as being able to drag and drop data to create charts I could also map data and undertake statistical analysis.  This made it great value for money for an independent as I wouldn’t need to invest in numerous stand alone systems.

Learning the craft

My learning has been split into two distinct learning routes; formal training and informal practice.  Formally I have attended classroom sessions for Tableau; ‘Desktop Fundamentals’, ‘Desktop II (Intermediate)’ and ‘Desktop III (Advanced)’.  For the latter two modules I have attended as part of a condensed Conference edition course.  These are intensive days with a huge amount of information to take on board.  I soon learnt the wisest approach was to keep up with the trainer as best I could and then go back to the manuals and exercises in my own time.

In October 2016 I attended a great 2 day data visualisation primer run by Andy Kirk in London.  This gave me a great overall insight into the data visualisation design process as well as good practice.

I have also read a few useful books in order to develop my knowledge of data visualisation best practice.  I am quite a theoretical learner so I like to take models and processes and try to make them own.

Few and McCandless jostling for position on my bookshelf…

IMG_1173

  • Formal training gave me a solid foundation of theory to build upon.

There are also many great open resources in the Tableau Community to learn from.  I have benefited a lot from Tableau Tip Tuesday authored by Andy Kriebel. This is a range of how to videos which often pop up when you google ‘how to build…in Tableau‘.

Another great resource is the Learning Tableau Blog by Charlie Hutcheson; where each week he writes about some of the technical challenges he has overcome in Tableau, often through ‘reverse engineering’ viz as part of  his ‘Take Apart Tuesday’ series.  Of which, downloading other people’s workbooks from Tableau Public is one of the best ways to learn Tableau.

However I knew that the very best way to learn would be to practice, practice and practice some more!  I needed a way to start using Tableau regularly so I would have developed some skills by the time I started using it professionally.

  • Practice helped me develop theory into tacit knowledge

Getting involved in data projects

Following the first Tableau Conference session I decided to take part in a project that kept being mentioned, called ‘Makeovermonday‘ (MOM).  For those of you few who are unaware, it is a weekly social data project originally run by Andy Kriebel and Andy Cotgreave in 2016 and then in 2017 by Andy and Eva Murray.  The aim is to take a published data visualisation and ‘make it over’ in order to improve the original charts.  The makeovers are published to Tableau Public and Twitter for a wider audience.

You may notice that the data visualisation at the start of this blog is ‘my first makeovermonday‘ as well as my first Tableau Public data viz!  Little did I know how influential this project would be in my ongoing development.

  • Firstly it has given me an opportunity to develop my Tableau skills through practical application.
  • Secondly it has developed a fantastic social network through following conversations on Twitter under the hashtag #Makeovermonday.
  • Thirdly it has allowed me the opportunity to develop a portfolio of over 50 visualisations I can share wherever I go.

It has been challenging at times though, particularly in 2016, when I was juggling many life events; ending one career and starting another as well as finishing a post graduate qualification.  Something had to give and for a few weeks it was MOM.  But I persevered and started a fresh in 2017 and have since completed most of the challenges.  I tend to use each challenge as an opportunity to learn and develop a new approach or skill, which can take time.

Another great project worth a mention is Viz For Social Good run by Chloe Tseng.  I took part in the Hip Hop Gardens Project, which combined several things I am interested in; hip hop music, social regeneration, community gardening with data visualisation!

It was a seminal project in my design development as I tried a more long form story board.  If you would like to know more, please read my blog about the project.

Hip hop and community gardening visualised…

Hip Hop Gardens

  • Getting involved in projects has helped me develop my Tableau skills, grow a network and develop a portfolio

Go where there are like minded people

In 2016 and 2017 I attended the Tableau Conference on Tour; both times were a fantastic experience.  In 2016 I was over whelmed by the range of content.  I also did not know anybody and struggled to socialise but tried my best.  By 2017, thanks to getting to know people from social media and participating in projects I knew a great bunch of like minded people.  This made it a much more rewarding experience.

A sunny Day 3 at Tobacco Dock…#Data2017

IMG_1020

I have also started attending the London Tableau User Group.  The organisers include Pablo Gomez and Sarah Bartlett, who in particular have been really welcoming as have the others involved.

Although its a bit of a trek from the Northwest, I have found it worth it as the quality of speakers has been fantastic.  For example I got to see Cole Knaflic talk about her Story Telling with Data Process of which I am a big fan.

Cole Knaflic talking at London TUG…

IMG_0867

  • Its more fun to talk about Tableau to other people, especially if you are working on your own.

Another great place to discuss data visualisation and Tableau is Twitter.  This has been invaluable for someone like me vizzing on my own, save for my cat Killi.  There are too many people to mention, but of those not already mentioned, who has been really helpful is Neil Richards.  If you haven’t checked out Neil’s innovative and experimental viz then I suggest you do!

Killi checks out some trend analysis of clinical prescription data…

IMG_0856

Finding your style

I am still developing my data visualisation style with practice.  I am also looking for who my audience is both professionally and personally.

I have also learnt to try to take criticism constructively and develop from it.  This isn’t always an easy thing to do, but is worth the reward.  Most people in the Tableau Community want to improve the quality of data visualisation and I’ve found they are more than willing to help with advice and guidance.

What I have found out works well:

  • Try new things
  • Ask for feedback
  • Learn what works and what doesn’t
  • Improve over time

So my advice to Tableau newbies is:

  • Formal training is a good place to start
  • Get involved in a practical project like #Makeovermonday
  • Practice and learn continually to improve
  • Attend a conference or a TUG
  • Engage the community on Twitter

Looking back…

It has been a great 12 months of learning new craft, discovering a great new tool and meeting lots of cool new people.  Its been exciting, hard work and scary all at once.

Looking forward my goals are:

  • Continue to develop my technical skills
  • Continue to develop my design process and style
  • Achieve Tableau Associate Certification to enhance my professional standing
  • Develop this blog to showcase the benefits of data visualisation to wider audiences

I would also like to find time to run some more of my own data visualisation projects.  I have interests in the world of social regeneration, which I specialised in for my research career and would like to do some more work in that field.

Thanks to…

All of the people mentioned in this blog post and others in the data visualisation and Tableau Community.

I would also like to thank my data visualisation checker supreme, my wife Ann without whose support I would never had been able to make this journey.

Visualising individual data plots using Box and Whisker charts

Introduction:

Each week I take part in a data visualisation challenge called ‘Makeovermonday’. The idea is to take a data visualisation that has already been published and make it over using good practice techniques. I use an industry leading data visualisation tool called ‘Tableau’.  Week 25 presented another Big Data challenge using 202 million records measuring air quality in the USA over time, powered by Exasol’s super fast database.  The dataset related to levels of ozone measured hourly and daily across US counties and states over several years and the impact upon public health.

In this blog I will tell you my approach to my makeover.  Then I will explore the chart type I used called a ‘Box and Whisker‘ chart, making a case for and against it based upon good practice theory compared to practical experience.

The original visualisation: 

Screen Shot 2017-06-23 at 13.53.57.png

Screen Shot 2017-06-18 at 15.00.19What did I like?

  • The colour legend shows which days are healthy or unhealthy throughout each year
  • Clear title and source telling me what the visualisation is showing
  • Interactivity allows me to drill down through geography and time

What could be improved?

  • There is a lack of context about what ozone is and the health concerns
  • It could be clearer in terms of showing magnitude of changes over time
  • There could be a story board approach to engage the user and help them navigate the trends

My approach:

  • To show some more contextual information about a) Ozone levels and b) how they are measured through the Air Quality Index
  • To show the size of trends over time more effectively
  • Use colour to highlight healthy versus un-healthy days as in the original
  • Visualise individual data points daily but drill down to the hourly level
  • Tell an interesting story!
  • I picked New York County in 2015 in order to filter down the data.  I chose to look within a year rather than across years.

Visualising via a Box and Whisker Chart:

  • I tried a new chart type I had never tried called a ‘Box and Whisker’ chart

Box and Whisker

The chart is a simplified representation of a distribution of data.

  • The box represents the range between the 1st and 3rd quartiles of data (Interquartile range).
  • The middle line represents the median (mid point) value.
  • The whiskers represent the outliers of the data points (at 1.5 the Interquartile range).
  • Half the data points are located within the box, the other half between the box and the upper and lower whiskers.

For a great introduction to this chart type, I referred to Alberto Cairo’s excellent book on data visualisation for communication;The Truthful Art (2016, p192).

I included two box and whisker charts; one looking at each day in each month of 2015 and another breaking it down further to each hour in each day of July 2015; the month with the highest ozone levels.

NYC Ozone Dashboard

There is a great discussion around when to use and when not to use Box and Whisker charts in The Big Book of Dashboards (BBOD) (2017, p61) by Andy Cotgreave, Steve Wexler and Jeffery Shaffer.  I will now compare the case for and against presented in the ‘BBOD’ against my own practical experience.

The case for Box and Whisker charts:

  • The box and whisker chart shows all the data points; whether there are 20, 2000 or 2 million.  It structures the data into boundaries of equal size.  As such it is a good chart for showing the distribution of the data.
  • It is also good for comparing distributions across categories such as dates in this case. The box and whiskers can be easily compared against one another to see how the medians and the ranges compare.
  • It is also effective for identifying outliers above or below the average.

In the practical exercise, the dot plots clearly show the increase in ground level ozone in New York County in the Summer months of 2015.  The whiskers are effective for showing that May and June have a greater variance in ozone levels.  The middle lines show that it is July, which has the highest median value.

In my visualisation the daily averages showed that for New York County in 2015 there were no days where average ozone levels were not ‘good’ in terms of impact upon health.  The second box plot looked at hourly distributions across individual days in July 2015 and highlights outliers which were hitting ‘un-healthy’ levels of ozone.  This insight was not apparent when just looking at daily averages.

The case against Box and Whisker charts:

Andy’s co-author Steve Wexler points out that they are less good for identifying individual data points as they overlap.  So if that is the goal then this may not be the right chart type.  In the book there are examples of charts where data points have been ‘jittered’ so every point is visible.

However in this case it was not necessary to see every data point, rather to identify general patterns of when the ozone levels had become unhealthy.  This was achieved by colour coding the data points.

The chart is not the easiest to interpret for an un-trained eye.  An aggregated view like a bar chart would be easier for a lay person to interpret. For example this is a comparison of the daily ozone levels for July 2015 in New York County using a bar chart compared to a box and whisker chart:

US Air Quality Aggregated

I agree that if we compare the aggregated bar chart against the disaggregated box and whisker plot, the former is easier to understand at a glance.  The boxes and whiskers, whilst adding more insight also add more clutter to the visualisation.

Although as Andy Cotgreave states in the BBOD (2017, p61) that “as with all charts, people can be trained to use them”.  Box and whisker charts may seem intimidating at first, but once you know what to look for I think they become easier to use.  However, adding more contextual information to help train users does present some design challenges in terms of not over complicating the view.  As such I included a logo with a tooltip on how to use the chart.

It is important to identify the audience in mind, and their ability to interpret a more complex chart type (Cotgreave et al, 2017, p55).  I designed the visualisation with a generalist audience but with a keen interest to take the time to read the visualisation e.g. an environmental campaigner.

It is very subjective in terms of how visually appealing Box and Whisker charts are and I know some people don’t like them. Well beauty is in the eye of the beholder as they say.  As Andy also says ‘it depends’ on the context or the audience.  I think they are visually appealing for an informed audience interested in digging a bit deeper into the data distribution.

Conclusions:

A complex subject based upon a large dataset can be visualised using either simple aggregate level bar charts or more complex disaggregated charts such as the Box and Whisker Chart.  Which approach to take depends upon who the audience is and the aim of the visualisation.

Box and Whisker charts are suitable for comparing distributions, showing outliers and drilling down into more detail.  In the practical exercise the chart allowed us to view seasonal patterns of ozone levels, compare ranges and identify the months with the highest medians.

This chart is less suitable if we want to compare individual data points as they overlap.  However this is less important as the boxes summarise the data distribution and allow general comparisons to be made.

The chart is less accessible than the bar chart view when looking at hourly emissions by day.  However the aggregated view misses the detail of insights which the colour coded dots give us.  The key learning point is that we often rely upon averages which hide underlying patterns.

Additionally with training supported by guidance notes the user can soon learn how best to use this chart type.  Hopefully, in future it then becomes easier to use.  However, this can present design challenges in terms of additional contextual information.

In terms of whether the Box and Whisker chart is attractive or not then I will leave that up to you to decide. I personally think they have their own aesthetic qualities.

Hip Hop Gardens – A Viz for Social Good Project

Introduction

I recently took part in a really worthwhile data visualisation project called ‘May Project Gardens’; featured under the #VizForSocialGood Programme run by Chloe Seng from the Tableau Community.  I was excited to take part in this project for several reasons as it combined several of my favourite things with data visualisation.  These included:

  • Social regeneration; I have worked in the field of social research to support community regeneration for over 18 years and recently graduated with a post graduate diploma in regeneration practice from the University of Chester.  I am passionate about projects which can bring communities together to enhance social capital.
  • Community gardening; I am a joint plot holder at a local allotment. My wife Ann is the project manager but I help out digging, weeding, planting and eating the fresh produce we grow!
  • Hip hop; I used DJ with vinyl, playing soul, funk, hip hop and reggae around my adopted city of Chester in the Northwest of England for 10 years.  So the topic piqued my interest as an innovative way to engage young people with gardening.

The Project Brief:

“May Project Gardens is an award-winning social enterprise – a highly skilled and passionate team working to empower and educate urban communities to live sustainably. 

Hip Hop Garden: an innovative, alternative education model using hip hop to educate and empower marginalised young people to live healthily, learn entrepreneurial skills, and grow their communities.

We now need to raise further money to run this award-winning course in the ninth most deprived ward in the UK – allowing these young people to take their skills to a new level, improve community cohesion and build their self-esteem. To do this, we need to present our data in a visually appealing way to donors and funders.

Communication goals:

  • To show the success of our Hip Hop Garden workshops and pilot programme in terms of youth engagement, learning and personal development”. 

My design process:

The first stage of any data visualisation is always to understand who the audience is and how they are likely to interact with the final product.

I spent a lot of time researching May Project Gardens to understand their vision and values.  In the Hip Hop Garden section of their website there is a really interesting video outlining what the project is about, its aims and objectives as well as some of the challenges they face in terms of funding.

Context;

The research was useful as it helped me understand how the data visualisation could meet the project brief of showing the success of the project workshops in terms of engaging young people.  There was a need to show the community issues which the project was attempting to address, being located in one of the most deprived areas in the country.  The project also needed to demonstrate the outputs generated in terms of the numbers of young people engaged and their satisfaction levels.  Most importantly I was keen to evidence the personal and social learning outcomes achieved by the students through participation.

I also learnt about some of the logistical challenges required to run a community garden from speaking to Ann who is currently studying them as part of a horticultural course.  I also realised that due to funding cuts to youth provision across London that a strong call to action to attract funding would be helpful at the end of the visualisation.

Screen Shot 2017-04-27 at 18.32.50

Data preparation;

There was a series of 10 workshops which had been run at Hip Hop Gardens in 2015 and 2016.  Participants had been given evaluation questionnaires and asked to state whether they enjoyed the events, what they learnt and how it had helped with their personal and social transformation.  The feedback was quite often verbatim, which posed challenges in terms of consistent analysis.

I spent quite a bit of time cleaning the data in Excel, to ensure consistency across each event before inputting into Tableau; my data visualisation package of choice.  Once is Tableau, there was some further work aggregating some of the verbatim feedback to make it consistent in order to measure generic outcomes such as ‘enjoyment’ for example.  This process, whilst time consuming did help me understand the nature of the data variables I was going to be visualising.

Design Choices:

I decided that a long form design would be useful to allow for an exhibition of key performance metrics combined with explanatory text to tell the story of the project.  I was keen to use pictures and quotations to portray a more emotive tone as this is a project about improving people’s lives rather than just a set of numbers. My aim was to attract a potential donor to find out more about the project and hopefully be interested enough to contribute towards funding it.

I did toy with the idea of using an image of a garden design and labelling different plant beds with data from the project.  This led me to imagine creating my own bespoke garden design from the different elements of the data visualisation.

I was keen to plan out a structure which would follow the flow of a project evaluation.  I started my career evaluating community regeneration projects and knew that this followed a ‘logic model’ approach of inputs leading to outputs and most importantly outcomes.  I wanted to follow a story board format with a beginning, middle and an end.

The beginning: Inputs

A bold title with the May Project Gardens logo, a brief definition of the project and a wide angle picture of people working in the garden to set the scene.  A brief introduction to say what the problems are in the local community, which the project is attempting to solve;

  • A recognised disconnect between local young people and the food they eat.
  • A reliance on local food banks as a sign of food poverty.
  • Addressing wider issues of poverty, disempowerment and access to resources.

To set the context I outlined the project inputs across three boxes;

  • What we do; hip hop to empower young people to take control of their health and empower their communities.
  • The aims of the project; engagement, education, empowerment.
  • How; the project delivers educational programmes.

The middle: Outputs

This section allowed me to be creative and have some fun!  I used different chart types in Tableau to display the project performance metrics whilst simultaneously re-creating a hip hop garden of my own.

There is a bar chart which represents the number of participants at each event.  I used a colour legend to show the number of people who enjoyed the events, compared to those who either did not reply or were not sure.  My consistent colour scheme was that green equated to positive feelings about the project.  Each bar represented a line of vegetables. A brown background was used to represent a soil bed. Grey was used to show some of the community problems the project is trying to tackle as well as the call to funding.

There was a challenge in that one of the events in July 2015 used a different evaluation questionnaire than the other events.  I showed the number of participants who recommended this event as a ‘tree-map’ which represented a lawn.  The ‘Marcus Lipton’ pilot project had a unique rating score, which I decided to display as a big number, which could be a bush or a potted plant.

When I showed this to Ann, her first reaction was; it’s good but where does it show what gardening activity takes place?  This was a fair point.  I had already seen some other visualisations from people like Michael Mixon; whose excellent submission for #VizForSocialGood used a dashboard action to show some of the feedback from each event.  I emulated this approach so that if you hover over one of the event bars the definition of the activity is displayed in a box underneath.  For example;

“Challenge 06_07_15: 15-17 yr olds; half day workshop; tour of the garden, garden activities, creating a rap/rhyme.”

A last minute design decision was to include a simple ‘word cloud’ to show the different opinions of participants of the project. Whilst this did add some additional detail to the visualisation, I thought it was worth including as it showed the fun that young people have had participating in the project.

The end: Outcomes

I followed the same three box approach which runs throughout the design to ensure a vertical symmetry.  I used another bar chart to show each individual displayed as a cabbage to show the connection between people and the food they produce.  Both Michael Mixon and Pooja Ghandhi advised me over Twitter how to import and edit images into Tableau.  For this visual I used Powerpoint to remove the white background around the cabbage.  This time a dashboard hover action displays the personal and social transformation outcomes.  I also used images taken from the project which display food, hip hop and community, made them transparent in Powerpoint and floated them in the background of the text.

Finally…

I was keen to include a call to action at the end so that people could donate to the project.  I imported a hover icon image as a png file and edited the url to point to the Hip Hop Gardens project page.

I also spent a long time double checking all the inter active elements worked correctly before I published it to Tableau Public.

What worked well?

I really enjoyed preparing and analysing some raw data and building a visualisation from scratch.  It educated me about a community garden project which was fulfilling.  I learnt about how to construct more detailed long form visualisations.  I also learnt about how to float transparent images in the background, which was a new technique for me.  The time I spent researching and story boarding at the beginning really helped me to structure my design to tell a story and answer the project brief.

What worked less well?

As with any survey data, the inconsistency in the event questionnaires posed some challenges.   In particular the data preparation took a while as it involved coding qualitative feedback.  Some of the vertical line dividers used to separate the sections of the visualisation made formatting the design layout tricky until I discovered that temporarily turning off ‘fixed height’ made it a lot easier to adjust the height ratios.  Additionally, I had undertaken the project over an intensive 3 day period, which helped with my creative flow but left me exhausted by the end.

To conclude:

This was an excellent project to take part in as it combined several of my favourite disciplines; social research, community regeneration with my personal interests in hip hop music and gardening.  I enjoyed being creative and thinking like a designer to tell a better data story.

However, this was an intensive process with some technical challenges.  It has made me consider that having access to a specialist data preparation tool would be useful.  In future, I would also spread the work for a project of this scale over a longer time period.

Overall I am really pleased that my design was chosen to be one of nine international data visualisations for the May Project Gardens #VizForSocialGood gallery.   I hope the visualisation helps the project to raise awareness for their excellent work as well as attract some much needed funding.

Hip Hop Gardens.png

Did the Diesel Testing Scandal Affect Volkswagen Car Sales in the Netherlands?

Each week I take part in a data visualisation challenge called ‘Makeovermonday’. The idea is to take a data visualisation that has already been published and make it over using good practice techniques. I use an industry leading data visualisation tool called ‘Tableau’.

The original data visualisation is a simple table showing Dutch export car sales in 2015:

Screen Shot 2017-05-10 at 08.10.54

What do I like about it?

  • It shows the top 10 car brands, model and volume in a nice sorted table.
  • It is simple and easy to understand.

What don’t I like about it?

  • The title could be more insightful.
  • Some car brands are missing.
  • There is a wealth of more information available in the dataset; why just exports and 2015 only?
  • The lack of context; how are export sales changing over time by brand?

My approach to the makeover:

The first thing I always do is to imagine who the audience for my data visualisation could be.  In this case I imagined a car salesperson, who would be interested in sales trends by brand.  I started asking some key questions to help formulate the angle and framing of my intended design:

  • Which are the top 5 car brands in terms of sales?
  • How have brand preferences changed over time?
  • In which months are car sales more or less popular by brand?

I needed a specific focus to look at in detail.  I did some research and came across the following NL Times article relating to the Volkswagen (VW) apologising to its customers after the global ‘Diesel Testing Scandal’ in October 2015.  This was interesting to me as I wanted to explore whether this had impacted upon VW car sales in the Netherlands.  I am also a big VW fan as my first ever car was a classic Mk 2 Golf!

My step by step design process:

Introducing the context;

  • I wanted to answer the questions I had set myself in a brief but insightful title.
  • I used the same colours and font (‘futura’) found on my local VW dealer website and a publicly available version of the famous VW logo.
  • I added some context to give my reader some background information as well as pose a question, which I would aim to answer.

Screen Shot 2017-05-16 at 14.30.57

The analysis;

  • Then I used some brief analysis based upon the charts that follow aim to set some more context and answer the research question posed in the introduction.
  • Simple bar charts were used to show that VW is the top brand for the last 5 years.  I deliberately colour coded VW in its corporate blue and the other brands in shades of grey so they would fade in to the background.
  • Monthly sales are presented as spark lines matching the same colour legend as the bar charts.  The VW sales line is deliberately thicker, so that it visually stands out compared to the other brands.  The maximum and minimum sales months are labelled using a ‘Max/Min Window’ calculation.  The axis are independently scales with no zero to exaggerate the differences between brands.
  • I did experiment with using a ‘Level of Detail’ calculation to differentiate the colours of the minimum and maximum sales values.  This is something I wish to revisit in a future makeover as it is a valuable but more complex process.
  • A text box is used as a subtle dividing line to break the visualisation into sections.

Screen Shot 2017-05-16 at 13.31.38

  • Colour coded trend lines show how the top 5 brands perform over the last 5 years.
  • A reference line is set for 2015, Q4 to indicate the date of the VW apology for the diesel testing scandal.
  • A ‘table calculation’ is used to show the percentage change in sales since the apology was published.
  • An area annotation indicates that sales for VW have increased, but not to the same extent as most of the other brands.
  • Data source, image copyright and design tags all listed as per standard.

Screen Shot 2017-05-16 at 14.46.24

Publishing the visualisation;

  • Because there are over 9 million sales records in the dataset, then in order to publish it efficiently I needed to aggregate the dataset to only those variables I had used.  To do this I followed this simple tutorial video courtesy of Andy Kriebel’s excellent ‘#TableauTipTuesday’ blog series.
  • Finally I published the data visualisation to Tableau Public and Twitter.  This is what the final visualisation looks like:

Dutch Car Sales v2

To conclude:

The analysis shows that VW are still the most popular car brand in the Netherlands, despite the diesel testing scandal.    However, whilst sales have increased since the public apology was published, they have not risen as much as most of the other competitor brands.

Lessons learnt visualising BIG DATA

This week in #Makeovermonday we got to play with some really ‘Big Data’.  The task was to makeover any one of a choice of data visualisations from a Government Briefing Paper on UK GP prescription data, which was made available on a fast analytic database courtesy of EXASOL . There was a huge 724M records to analyse including a bewildering array of drugs listed under their chemical substance names and codes.  The first challenge was to pick a topic to explore.  This was the easy part for me as I soon came across a brief summary about:

Antibiotic prescribing

“The challenge of antimicrobial resistance means that the NHS has been aiming to reduce prescribing in antibiotics. In 2015, around 2 million fewer antibacterial drugs were prescribed than in 2014– a reduction of 5%. However, the scale of this reduction varied across the country”.

The Original Data Visualisation:

Screen Shot 2017-04-21 at 15.45.56.png

What did I like?

  • Clear title telling me exactly what the charts are about.
  • Bar charts are easy to understand.
  • Clear labelling showing current item prescription rates / 1000 population and change since 2014.

What didn’t I like?

  • The charts are very descriptive and require reading the briefing report to understand the trends; for example, the changes do not reflect inherent excess prescribing.
  • Lack of context in the charts about why increased anti-biotic prescription rates are a bad thing i.e. the aim of the NHS is to reduce rates to avoid bacterial resistance.
  • There is more information available about the volumes of drugs and costs to the NHS  in the prescription dataset, which could be included for a more interesting story.
  • Increases and decreases in the prescription rates both go right to left and use the same colour scheme, so visually difficult to differentiate between increases and decreases.

How did I approach my makeover?

I spent some time reading through the guidance which Eva Murray had helpfully posted about how to connect to the EXASOL database, as well how to approach such a Big Data set without getting lost.  I spent even more time reading through the background report itself and found this useful strategy document which identified some of the commonly prescribed antibiotic drugs as well as an interesting side story about resistance rates.

This helped me to identify three potential anti-biotic drugs which are prescribed to treat a range of bacterial infections.  These included:

  • Amoxicillin; a common type of penicillin prescribed for a wide range of infections.
  • Ciprofloxacin; a specialist drug prescribed for severe infections like Anthrax with some serious or disabling side effects.
  • Gentamicin; is used to treat severe or serious bacterial infections.  It can harm your kidneys, and may also cause nerve damage or hearing loss.

What was my design thinking?

I followed my usual design process using some mind mapping software to answer the following questions.  This process doesn’t take too long and really helps me break down a big dataset into its component parts as well as help me clarify my approach.

Who was my audience?

  • I aimed my visualisation at a health commissioner, who would be interested in volumes and costs of anti-biotic prescriptions.

What was my angle, frame and focus?

  • My angle describes the question I wish to answer and the frame is the part of the data I wish to visualise i.e. what are the volumes of items prescribed, total actual cost (including discounts and additional costs) and costs per item of different types of anti-bacterial drug prescription items over time?
  • I chose to focus upon the difference between generic antibiotics like Amoxicillin, which is prescribed in large volumes at a low cost per item compared to more specialised drugs like Ciprofloxacin and Gentamicin prescribed in lower quantities at a higher cost per item.

How best to represent the data?

  • I chose colour coded bar charts to show the volumes of items prescribed, total actual cost of prescription as well as cost per item; these provided some overall context and showed the differences between the three anti-biotics as well as acting as a legend.
  • Colour coded line charts showed trends over each quarter.   I also added a Table Calculation to show the percentage change since the baseline date shown as a line label.
  • I stripped out unnecessary clutter in terms of grid lines, added some commentary to show the trends; which I shaded these as call out boxes so they would stand out.
  • I added some interactivity via an information tool with definitions of the drugs to save space and a link to the strategy document for further information on the difference in resistance rates.

Publishing Big Data Visualisations

  • I also followed Andy Kriebel’s useful video on how to create an aggregated extract so I could reduce those 724M records down to a more manageable 43,000 (0.01%) so I could save it locally as well as to my Tableau Public Profile.

This is what my first visualisation looked like:

Cost of Antibiotics v1

Further iterations were to follow!

Screen Shot 2017-04-21 at 17.13.40

Thanks to Tamara Gross for spotting this and letting me know.  Of course the quarterly data for Q1 2017 was incomplete and as it happened so was the 2010 data, which explained the sudden drop off. By filtering out incomplete quarters led to a different story in terms of items prescribed and cost per item over time. A key lesson is to never assume the data you are analysing is a nice complete set, especially if it is dynamically sourced from a database.

This is what my second visualisation looked like:

Cost of Antibiotics v2

Getting there but not quite….

Screen Shot 2017-04-21 at 17.41.12I applied the ‘where are my eyes drawn to test’ which Cole Knaflic advocates and I agreed with Adam Crahen that the call out boxes were competing with the data.  Thanks Adam.

So one final iteration followed:

Cost of Antibiotics variant 2

What was good?

  • Overall I have a lot longer this week researching the subject as well as the technical aspects of dealing with such a large dataset.  I think this really helped me get to know the subject matter and develop a good story.
  • I enjoyed exploring a really big dataset and was pleased with how easy the technical side of things worked.
  • The feedback from the Tableau community was, as ever useful to help me develop a better visual design approach.

What was not so good?

  • The issue with the incomplete dataset could have been avoided with some additional checking.
  • I think upon reflection I could have added some more context to the introduction to explain the dataset and the importance of reducing anti-microbial prescriptions.  Some definitions of total actual cost, items prescribed and cost per item would have been helpful to the reader as well.
  • I spent far longer than normal on this particular visualisation; whilst I think this was time well spent there is always a trade off to be had of quality versus time versus cost.

To conclude:

  • It is useful to spend time researching the background data (and fun too) in order to develop an interesting story, so long as I am mindful of the time I am spending on the project.
  • Double check for completeness of datasets (e.g. dates) as this can have a big impact upon the insights you draw.  A simple checklist can help facilitate this process.
  • Double check the visual focus by employing the where are my eyes drawn test.
  • Continue to iterate based upon community feedback and my own reflections.
  • Overall another enjoyable makeover where I learnt a lot.

Using colour to tell a better story with data

Each week I take part in a data visualisation challenge called ‘Makeovermonday’. The idea is to take a data visualisation that has already been published and make it over using good practice techniques. I use an industry leading data visualisation tool called ‘Tableau’.

The topic of Week 12 was something I had never heard of before called ‘March Madness’.  This turned out to be a single elimination tournament played each Spring in the USA, whereby 68 college basket ball teams from Division 1 of the National Collegiate Athletic Association battle it out to reach the National Championship also known as the ‘Final Four’.

As usual I downloaded the dataset and drafted my visualisation plan; which included what I did and didn’t like about the original visualisation and how I could improve upon it.  This is the original visualisation:

01

What did I like?

  • It told me that there are seeded teams in the competition
  • A range of seeded teams have made the Final Four over time since 1985

What didn’t I like?

  • It doesn’t make sense to sum the number of seeds above the bar charts
  • Stacked bar charts are not the best measure of change over time as it is difficult to directly compare the different categories apart from those directly next to the axis
  • There is too much colour on the bars, which makes it difficult to interpret

How did I approach my makeover?

The dataset was quite complex with numerous dimensions including; winning and losing seeds, region, rounds etc.  There were only two measures though; winning and losing scores.  My initial thought was ‘how can I tell a story’ from the data?

  • After much exploring the dataset in Tableau, I settled upon looking into the ‘average winning score margins’ (average of the winning score less the losing score).  I was interested in whether some teams won by greater or lesser margins and whether this had changed over time
  • In terms of chart types then I kept it simple with a trend chart to show average winning score over time and my good friend the bar chart to show how it breaks down by winning seed and winning teams
  • I also got to try out a ‘nested sort’ on the bar chart.  For this I followed notes from Tableau and also Tableau Tips Tuesday from Andy Kriebel
  • My published first submission looked like this:

March Madness Average Winning Score Margins

However I quickly realised that whilst it does show which seed or which teams had won by the largest average margin nothing really stood out because there was so much orange! There was no contrast of colour.

Using colour to draw your audience’s attention

I have been reading the fantastic book ‘Story Telling with Data’ by Cole Nussbaumer Knaflic who states that “When used sparingly, colour is one of the most powerful tools you have for drawing your audience’s attention” (Nussbaumer Knaflic, 2015, p117).

This is because intensity of colour, along with position and form are what are known as ‘pre-attentive attributes’, that the human memory has evolved to process very quickly in order to notice differences.  Pre-attentive attributes can, as Nussbaumer Knaflic (2015, p104) states be used to “enable our audience to see what we want them to see before they even know we’re seeing it!”  The key learning point there about colour is ‘when used sparingly’.

In his iconic book ‘Show Me the Numbers’, Stephen Few (2012, p79) discusses the use of contrast to draw attention to those elements we wish to stand out.  However, as the number of things contrasting increases Few argues so the degree to which differences stand out decreases.  The message becomes buried in visual clutter as “when everyone in the room shouts, no one is heard” (Few S. 2012, p79).  This meant the audience has to use more cognitive brain power to interpret the finding.

  • For example, I had originally tested using colour to reinforce the difference in average winning score by team.  To do this I used a divergent colour scale which looked like this:

Winning Score by Seed

  • However, I quickly changed my mind as I reasoned that whilst it was clear which teams had the highest average winning score, it was confusing as there was a lot of contrasting colours on show making it harder to interpret the story I was trying to convey – there was too much contrast
  • Eva Murray’s excellent weekly round up of Makeovermonday submissions picked up on this point in which she said: “do I need to put the same dimension or measure on size, colour and shape at the same time? Using multiple ways of conveying the same information can be confusing.

Establishing my editorial thinking

In his very useful 2016 book; ‘Data Visualisation’, Andy Kirk addresses the need for an ‘angle’, a ‘frame’ and a ‘focus’ as part of editorial thinking.  The angle is the interesting insights you want to communicate to your audience, the frame includes the specific details you wish to include or exclude whereas the focus is the features you wish to draw attention to.  I had my angle (average winning score margins for the Final Four), I had my frame (average winning score margins over time, seeds and teams) but I had failed to really focus on a particular story point.

So I iterated again with colour but this time, thought about which element of average winning score margins did I want to focus upon?  This presented a range of possible stories; for example did I want to highlight those teams with an average winning score margin of 9 or above (the same as the number 1 seeds) or pick out one stand out team such as Nevada-Las Vegas?  In the end, the focus I decided upon was the difference between the number 1 seeds and all the other seeds.  I wanted to show that despite the number 1 seeds having the largest average winning score margin, actually there was a wide range of average winning score margins within each seed.

What did I change?

  • I created a set for Number 1 seeds and used that to differentiate the colour for the line and bar charts
  • I tweaked the line chart using a duplicate version which I added onto the chart as a dual axis and converted to a circle – this allowed me to colour the number 1 seeds.  I liked this because the orange circles represent the basketball image
  • I re-worded the title to reflect my new focus
  • I also changed the time series axis to YY format and replaced the direct labelling of the bars with an axis as I had some useful feedback from @AlleMeineDaten that this would be a clearer visualisation of the numbers representing seeds rather than ranks
  • My final Makeover is available on Tableau Public and hopefully uses just enough colour to tell a more effective story than either the original visualisation or my first makeover

March Madness Colour

So what did I learn?

  • To tell a story you need an angle, a frame and a key focus
  • Colour is a powerful visual aid to help focus attention
  • However it needs to be used sparingly to tell a specific story point
  • There is a lot of theory about good practice to draw upon, by combining it with practice you can generate your own new knowledge
  • Asking for feedback from the Makeovermonday community is a great way to improve your visualisation as they will spot things you may miss
  • Don’t rush into publishing a visualisation until you are happy with it and if not then go back and correct it and re-publish to tell a better story