Designing effective visualisations

Overview

Teaching: 60 min
Exercises: 60 min
Questions
  • What are some common errors in visualisation?

  • How can these be overcome

Objectives
  • To be able to restrict visual elements to only those that are necessary

  • To avoid common visualisation errors

By breaking down visualisations with a consistent set of terms, you will hopefully begin to see the flexibility and freedom of this system. This is the core concept you will need tomorrow as we learn how to create these graphics. If you can describe a plot using the vocabulary and grammar you are learning, you can create it. Of course, if it is possible to create any plot we can imagine it is left up to us to enssure we are creating a good one.

There are plenty of guidelines available that suggest ways to create effective visualisations. You will likely already have some personal ideas about what makes a good plot.

Discussion

What features do you think make a visualisation effective?

Compare answers among several people. Are there any consistent responses or is there a large component of personal opinion?

While by no means a comprehensive list, there are a number of things you might want to think about when constructing a visualisation of your own.

Data mapping

Avoid mapping a data variable to multiple aesthetics

Think back to the exercise on the 3D bar chart:

The same variable was mapped to both the z axis and the bar colour, creating a redundancy. This is not necessarily a problem, but can usually be avoided without losing any clarity in the plot. Cutting out the redundancy also frees up a visual property to encode new information if you are looking to show the relationships between more data.

Avoid mapping multiple data variables to an aesthetic

Consider the following figure from Spurious Correlations

Challenge

Draw out the links between data variable and aesthetic mappings in this figure. What is strange about this?

Solution

There are two data variables that have both been mapped to the y axis.

By mapping two variables to the same aesthetic we lose clarity in the plot. It takes time and mental effort to separate the same visual property back to two potential data sources.

In particular, if your two data sources are not on the same scale it can be very easy to influence the interpretation of the data by changing their scales separately.

Consider your geometries carefully

One reason why there are so many types of plots is because different geometries can be interpreted very differently by the human brain, even though they display the same data. Perhaps the best example of this is bar charts, and the discussions around whether they should always start at zero.

When looking at a bar chart, the length of the bar is what determines how people interpret the data. This differs from an equivalent chart that used points in which the position of the point is what matters. Changing the baseline of a bar chart can drastically alter how the chart gets interpreted:

plot of chunk bar_zero

This can be seen even more readily when the shape chosen is more easily mapped to a real, physical object.

Discussion

Are there any other features of this figure that make it an ineffective visualisation?

Colour

Friends don’t let friends use rainbow colour scales

Rainbow colour scales are very, very common in scientific literature. They are a very natural way to represent quantitative data - the wavelength of the colour changes in a linear relationship with the data.

The problem is that our eyes don’t perceive those changes in colour in a uniform way.

rainbow-colourbar

This can have serious real world consequences for scientific data visualisation.

rainbow-problems

The plot on the left was published in a paper in 2006 using a rainbow colour scale, and reproduced in a sequential colour scale on the right (from https://www.atmos-chem-phys.net/6/5183/2006/.

In the versions below, the blue line represent a false boundary that was identified in the data, entirely due to the way our eyes perceive colour - it isn’t actually present in the data!

rainbow-problems2

It’s not just continuous rainbow colour scales that have this problem. Once you start looking for them, you will see them everywhere.

One of the worst offenders.

Other colour considerations

Aside from the evils of rainbow colour scales, it is also important to consider issues such as:

An excellent resource for many considerations to do with colour in visualisation is ColorBrewer, which provides a range of scales for different applications.

Challenge

Explore ColorBrewer to understand how it can be used to choose colour palettes. If you had four categories in your data to be represented by colour, how many palettes can you find that are colourblind friendly for each of the sequential, diverging, and qualitative palettes?

Solution

Sequential: 18

Diverging: 6

Qualitative: 1

Ink to information ratio

Compare the amount of ink used with the amount of information communicated. A high ink:information can indicate a plot that hasn’t been thought through, and is often distracting or confusing. This is a good prism to evaluate your plots and assess the inclusion of each component. Everything in a plot should be there for a reason.

These plots often contain ‘chart junk’ (Edward Tufte).

Challenge

Compare the above figure.

  • Which elements are removed in the second version?
  • Which elements could still be removed? What is their purpose?

Show as much raw data as possible

By putting your raw data into a figure, it becomes easier to see the true patterns in the data. Remember with the datasaurus dozen dataset where summary statistics obscured the true meaning in the data.

plot of chunk show_raw

Discussion

How do your interpretations of the data change in the above plots as you move closer towards the raw data?

Overplotting

As a result of the suggestion to show your raw data, you will quickly run into the problem of overplotting when there are too many data points…

plot of chunk unnamed-chunk-1

Discussion

What are some possible solutions to overplotting?

Transparency

plot of chunk unnamed-chunk-2

Jittering

plot of chunk unnamed-chunk-3

Binning

By binning and counting the data points you start to step away from displaying the raw data. But when you have large datasets to show, it may be the best solution. plot of chunk unnamed-chunk-4

Remember that the goal of a visualisation is to explain some feature of a dataset. We want to make sure that our figures are clean, clear and easily ingerpretable. If unsure, ask someone with a fresh pair of eyes to look over your figure and see if they can explain what is going on.

Challenge

Go back to your own figure that you described in a previous challenge. Critically examine it for areas that could be improved.

Ask a partner to look over your figure and tell you what the main message is.

Key Points

  • Don’t use rainbow colour scales!

  • Colour scales must not confuse the data or add artefacts

  • Visual components that do not aid understanding should be removed

  • Overplotting reduces the clarity of communication through visualisation