Introduction to Visualisation

Overview

Teaching: 120 min
Exercises: 120 min
Questions
  • What are the basic building blocks of visualisation

Objectives
  • To be able to identify the links between a data visualisation and the underlying data structure

  • To recognise the common visual features of a visualisation

Why visualisation?

We have spent a lot of time learning about manipulating and exploring data. What sort of information can we learn like that?

Let’s meet some datasets:

# A tibble: 1,846 x 3
   dataset     x     y
     <dbl> <dbl> <dbl>
 1       4  55.4  97.2
 2       4  51.5  96.0
 3       4  46.2  94.5
 4       4  42.8  91.4
 5       4  40.8  88.3
 6       4  38.7  84.9
 7       4  35.6  79.9
 8       4  33.1  77.6
 9       4  29.0  74.5
10       4  26.2  71.4
# … with 1,836 more rows

Get an idea of their overall structure:

summary(data)
    dataset         x               y           
 Min.   : 1   Min.   :15.56   Min.   : 0.01512  
 1st Qu.: 4   1st Qu.:41.07   1st Qu.:22.56107  
 Median : 7   Median :52.59   Median :47.59445  
 Mean   : 7   Mean   :54.27   Mean   :47.83510  
 3rd Qu.:10   3rd Qu.:67.28   3rd Qu.:71.81078  
 Max.   :13   Max.   :98.29   Max.   :99.69468  

And some summary statistics for each dataset:

data %>% 
  group_by(dataset) %>% 
  summarise(
    mean_x = mean(x),
    mean_y = mean(y),
    sd_x = sd(x),
    sd_y = sd(y),
    correlation = cor(x,y)
  )
# A tibble: 13 x 6
   dataset mean_x mean_y  sd_x  sd_y correlation
     <dbl>  <dbl>  <dbl> <dbl> <dbl>       <dbl>
 1       1   54.3   47.8  16.8  26.9     -0.0641
 2       2   54.3   47.8  16.8  26.9     -0.0686
 3       3   54.3   47.8  16.8  26.9     -0.0683
 4       4   54.3   47.8  16.8  26.9     -0.0645
 5       5   54.3   47.8  16.8  26.9     -0.0603
 6       6   54.3   47.8  16.8  26.9     -0.0617
 7       7   54.3   47.8  16.8  26.9     -0.0685
 8       8   54.3   47.8  16.8  26.9     -0.0690
 9       9   54.3   47.8  16.8  26.9     -0.0686
10      10   54.3   47.8  16.8  26.9     -0.0630
11      11   54.3   47.8  16.8  26.9     -0.0694
12      12   54.3   47.8  16.8  26.9     -0.0666
13      13   54.3   47.8  16.8  26.9     -0.0656

Challenge 1

What can we say about these datasets just from looking at their summaries?

Let’s try plotting one of them to see if we can spot anything new:

plot of chunk dino_scatter

And it immediately becomes apparent what all those numbers mean

The other data sets we looked at are known as the datasaurus dozen. A group of 12 two-dimensional datasets that have identical:

But…

plot of chunk scatter12

Discussion - What is a data visualisation?

What is the difference between panel E and the other panels in the following figure?

"Plot vs diagram"

The roles of data visualisation

In general there are two purposes behind data visualisations:

In both cases, a successful visualisation will illuminate and aid understanding, not confuse or obscure.

Discussion - What makes an effective data visualisation?

The figure below is from an Axios article about satellite launches. "State of the space race"

How effective is this visualisation at conveying the story of the dataset, and what features do you feel contribute to that success?

Talking about visualisations

Our goal today will be to learn how to describe visualisations and their connections to data. We will build up a consistent vocabulary around visualisations that can be used to describe and create complex visualisations out of a series of simple tools.

The big concept of the day is any plot can be constructed by mapping variables in your data to visual properties (aesthetics) which are then applied to the physical marks (geometries) chosen to represent the data

Process outline

New terms

Aesthetic:

A visual property of the elements in a figure. Colour, size, and shape are all examples of aesthetics that affect the way your figure is displayed.

Geometry:

The physical representation of your data. Points, lines, and bars are examples of geometries that could be used to illustrate the data in a figure.

This will probably be a very big change in how you think about figures, let’s try it on a few examples.

Straight to a classic

minard

Challenge

List the data elements contributing to this plot?

Hint: One way to answer this is to ask yourself what the column names in the underlying data frame might be.

What are some of the visual properties (aesthetics) used to convey this information?

Which aesthetics are linked with which data variables?

Solution

The major variable/aethetic mappings are:

Data Variable Visual Component
Longitude x Axis
Latitude y Axis
Number of troops Line width
Direction of travel Line colour
Temperature y Axis (bottom chart)
Date Text

Part of the reason this is a classic example of data visualisation is because of how information dense the image is and how clearly it conveys different elements of the data. Let’s try the same approach with a more recent figure.

A modern example?

List the data elements and visual aesthetics of the above figure.

How they are linked together?

Solution

Data Variable Visual Component
No. landraces (l) x Axis
No. gametes per landrace (g) y Axis
r^2 decay distance z Axis
r^2 decay distance Bar colour

Working on a single figure at a time doesn’t give a good idea of how changes to the aesthetics or geometries can affect a figure. Changing either of these components can hide or reveal information, and can sometimes completely change the story your figure is telling.

Consider the following set:

Changing aesthetics

plot of chunk changing_aesthetics

Challenge

List the data elements, visual aesthetics, and geometries of the above figure to convince yourself that the only changes are in the mappings of the data to aesthetics.

Solution

Left figure:

Data Variable Aesthetic
Weight x Axis
Fuel efficiency y Axis
No. of cylinders Colour

Middle Figure:

Data Variable Aesthetic
Weight x Axis
Fuel efficiency y Axis
No. of cylinders Size

Right figure:

Data Variable Aesthetic
Weight x Axis
No. of cylinders y Asis
Fuel efficiency Colour

Each of these figures uses a point geometry

Changing geometries

plot of chunk changing_geometries

Challenge

Compare the three plots above

  • How are the data elements linked to the visual elements?
  • What is the difference between the three?
  • Do any of them tell a story more clearly?

Solution

All figures have the same aesthetic mapping:

Data Variable Aesthetic
Day x Axis
Mean water use y Axis
Species Colour

But each of them uses a different geometry to represent this data. Bars, lines, and points from left to right.

Challenge

Which of the following are aesthetics?

  1. Colour, size, shape
  2. X and Y axis
  3. Points, lines and bars
  4. Both (1) and (2)

Solution

The correct answer is 4. From a ‘grammer of graphics’ framework, the X and Y axes are visual elements just like colour, size, and shape. Geometry (e.g. lines, bars, points) is not - it affects how your aesthetic mappings are displayed, not what the aesthetics themselves are.

Scales

While the links between data, aesthetics, and geometries describe the core features of a visualisation, there are several other components that influence the final figure.

Consider the following pair of plots:

plot of chunk scales_intro

Challenge

Compare the two plots above

  • How are the data elements linked to the visual elements?
  • What term would you use to describe the difference between the two?

Solution

Each plot has exactly the same data, aesthetic mapping, and geometries. The only difference is that the x and y axis in the right hand figure is scaled logarithmically.

So this new concept extends our previous understanding of the visualisation process by affecting exactly how values of your data are assigned to values of the visual property.

New term

Scale:

Controls how data values get converted into their aesthetic values. For example, the scale of a colour aesthetic determines what colours the data points are assigned to.

To demonstrate that scaling is affecting the mapping of data to aesthetics, consider the following example:

plot of chunk scales_v_data

In one of these, the data is transformed before plotting. In the other, the data values are left unchanged and just the scaling as to how those data points are mapped to the x and y axis is altered.

Do you find each option easier or harder to interpret?

Scales apply to all visual properties, with colour being the easiest to observe differences:

plot of chunk colour_scale

In this example, the cylinder variable is mapped to three different colour scales. One continuous (A) in which the values of cylinder are mapped to a spectrum of dark to light blue. The other two are categorical (B, C) in which the unique values of cylinder are mapped to individual colour values. The specific colour values are determined by the scale and are different between the two plots.

The choice of scale for your visualisations will depend on your data. Are they continuous or categorical? Can they be mapped directly onto an appropriate scale, or would a transformed scale better represent the data?

Coordinates

As with scales controlling the conceptual mapping of data to a visual property, coordinates control the physical mapping of geometries to positions on a screen/page.

New term

Coordinate system:

Controls how the points, lines, bars etc. of a visualisation get placed in physical space.

The coordinate system you are most familiar with is the Cartesian coordinate system. This is the x/y two dimensional plane that has been used in the examples so far.

A simple modification we could make to this system is to enforce a condition on the coordinates that our two dimensions have to be equal. In other words, a unit of length along the x axis is the same physical distance as a unit of length along the y axis:

plot of chunk coord_fixed

In this example, you can see that on the right hand figure, the distance between -50 and 0 on the x axis is exactly the sasme distance as between -50 and 0 on the y axis.

Discussion

Why might you choose to enforce this restriction on the coordinate system?

While this might seem like a small change, changing the coordinate system can have a drastic effect on the presentation of your data and how it is interpreted.

plot of chunk coord_more

Discussion

The two plots above show exactly the same data as the grid previously. Do the changed coordinates change how you might interpret the data?

Applying your visualisation vocabulary

By now you have developed a vocabulary that will let you talk consistently about figures. Time to try it out on something you are familiar with.

Challenge

Select a figure you have produced, or one that you have seen and thought was very effective.

Describe the data they contain, how that data is mapped to the visual aesthetics and what geometries were chosen to represent the data. Did you have to consider the scale or coordinate system for the figure to get your deired point across?

Find a partner and share your figure and description. Do your figures share any common features?

This process allows you to decsribe a figure even if you are not familiar with how to create it.

Challenge - Advanced

Consider the following visualisation from The Guardian showing the change in voting patterns at the 2018 US elections:

How would you describe this visualisation?

Hint: You will need to create a new type of geometry that we haven’t discussed to describe this. What might you call it and what aesthetics will it use?

That special something

Until now, everything we have discussed has had some functional role in creating a figure. We have been talking about how to display our data with a visual representation that allows us to understand the features of our data. There are a lot of other aspects to a plot that can affect the final look however.

Challenge

What changes between the following plots? What term would you use to describe these features?

plot of chunk themes

These aspects to not change the data elements of a figure, but do have an impact on how the plot looks. We will refer to these as the theme of a visualisation.

New term

Theme:

Any non-data element of a visualisation. Changing these can still have a dramatic effect on the final presentation of the figure..

Key Points

  • Visualisation is valuable

  • There are common elements that link data to visual properties

  • By mapping data attributes to visual attributes clearer visualisation is possible