Data

Overview

Teaching: 40 min
Exercises: 15 min
Questions
  • What are the basic data types in R?

  • How can I collect data together?

  • How can I store data of different types?

Objectives
  • To be aware of the different types of data.

  • To understand vectors and how to create them

  • To be aware of lists and how they differ from vectors

Data Types

For R to know how to deal with values, it needs to know what type a value is. As a way of thinking about this, imagine you are given the instruction: 3 + 4. Fairly straightforward. Now imagine you are given the instruction: 3 + green. This is much harder to deal with, because 3 is a number, and green is a word. For a programming language to be predictable, it needs to be able to deal with and understand values of different types.

In R, there are 5 main types: double, integer, complex, logical and character.

We can ask R what type a particular value (or object) is with the typeof function:

typeof(3.14)
[1] "double"
typeof(1L) # The L suffix forces the number to be an integer, since by default R uses float numbers
[1] "integer"
typeof(1+1i)
[1] "complex"
typeof(TRUE)
[1] "logical"
typeof('banana')
[1] "character"

No matter how complicated our analyses become, all values in R is interpreted as one of these basic data types. This strictness has some really important consequences, and can be the cause of some confusing errors.

Dates

Dates and times are another special data type it’s good to be aware of in R. Working with dates and times is its own detailed topic, but we’ll cover them very briefly here so you’re aware of some of the options.

The lubridate package makes working with dates and times easier (it is also hands down the best package name out there).

To get the current date or date-time:

today()
[1] "2024-03-12"
now()
[1] "2024-03-12 11:30:11 AEDT"

When working with dates, you can specify the format when reading a string:

ymd("2019-02-13") #year month day
[1] "2019-02-13"
mdy("February 2nd, 2019") #month day year
[1] "2019-02-02"
dmy("13-Feb-2019") #day month year
[1] "2019-02-13"

Date-time can be created:

ymd_hms("2019-02-13 20:11:23") #year month day hour minute second
[1] "2019-02-13 20:11:23 UTC"
mdy_hm("02-13-2018 08:02") #month day year hour minute
[1] "2018-02-13 08:02:00 UTC"

Collections

So far, we’ve been creating and working with values in isolation (a <- 5). But this is very rarely how we work with data. More typically values exist in relation to other values in a group. And those groups often relate to other groups.

R provides structures for managing these groups, or collections of data. The two basic types we will work with are vectors and lists.

Vectors

A vector is a collection of values in a particular order. A critical distinguising feature of the values in a vector is that they must be of the same type.

We can create a vector with the vector function:

my_vector <- vector(length = 3)
my_vector
[1] FALSE FALSE FALSE

To emphasise, everything in a vector must be the same basic data type. If you don’t choose the datatype, it will default to logical; or, you can declare an empty vector of whatever type you like.

another_vector <- vector(mode='character', length=3)
another_vector
[1] "" "" ""

You can also make vectors with explicit contents with the combine function (c):

combine_vector <- c(2,6,3)
combine_vector
[1] 2 6 3

Given what we’ve learned so far, what do you think the following will produce?

quiz_vector <- c(2,6,'3')

This is something called type coercion, and it is the source of many surprises and the reason why we need to be aware of the basic data types and how R will interpret them. When R encounters a mix of types (in this case, numeric and character) to be combined into a single vector, it will force them all to be the same type. Consider:

coercion_vector <- c('a', TRUE)
coercion_vector
[1] "a"    "TRUE"
another_coercion_vector <- c(0, TRUE)
another_coercion_vector
[1] 0 1

The coercion rules go: logical -> integer -> numeric -> complex -> character, where -> can be read as are transformed into. You can try to force coercion against this flow using the as. functions:

character_vector_example <- c('0','2','4')
character_vector_example
[1] "0" "2" "4"
character_coerced_to_numeric <- as.numeric(character_vector_example)
character_coerced_to_numeric
[1] 0 2 4
numeric_coerced_to_logical <- as.logical(character_coerced_to_numeric)
numeric_coerced_to_logical
[1] FALSE  TRUE  TRUE

As you can see, some surprising things can happen when R forces one basic data type into another! Nitty-gritty of type coercion aside, the point is: if your data doesn’t look like what you thought it was going to look like, type coercion may well be to blame

The combine function, c(), will also append things to an existing vector:

ab_vector <- c('a', 'b')
ab_vector
[1] "a" "b"
combine_example <- c(ab_vector, 'SWC')
combine_example
[1] "a"   "b"   "SWC"

You can also make series of numbers:

my_series <- 1:10
my_series
 [1]  1  2  3  4  5  6  7  8  9 10
seq(10)
 [1]  1  2  3  4  5  6  7  8  9 10
seq(1,10, by=0.1)
 [1]  1.0  1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9  2.0  2.1  2.2  2.3  2.4
[16]  2.5  2.6  2.7  2.8  2.9  3.0  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9
[31]  4.0  4.1  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.0  5.1  5.2  5.3  5.4
[46]  5.5  5.6  5.7  5.8  5.9  6.0  6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
[61]  7.0  7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.0  8.1  8.2  8.3  8.4
[76]  8.5  8.6  8.7  8.8  8.9  9.0  9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9
[91] 10.0

We can ask a few questions about vectors:

sequence_example <- seq(10)
head(sequence_example, n=2)
[1] 1 2
tail(sequence_example, n=4)
[1]  7  8  9 10
length(sequence_example)
[1] 10
class(sequence_example)
[1] "integer"
typeof(sequence_example)
[1] "integer"

Finally, you can give names to elements in your vector:

my_example <- 5:8
names(my_example) <- c("a", "b", "c", "d")
my_example
a b c d 
5 6 7 8 
names(my_example)
[1] "a" "b" "c" "d"

Challenge 1

Start by making a vector with the numbers 1 through 26. Multiply the vector by 2, and give the resulting vector names A through Z (hint: there is a built in vector called LETTERS)

Solution to Challenge 1

x <- 1:26
x <- x * 2
names(x) <- LETTERS

Lists

Another basic data of grouping values is the list. A list is simpler in some ways than the other types, because you can put anything you want in it:

list_example <- list(1, "a", TRUE, 1+4i)
list_example
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

[[4]]
[1] 1+4i
another_list <- list(title = "Numbers", numbers = 1:10, data = TRUE )
another_list
$title
[1] "Numbers"

$numbers
 [1]  1  2  3  4  5  6  7  8  9 10

$data
[1] TRUE

Lists can even contain other lists:

nested_list <- list(list_example, another_list)
nested_list
[[1]]
[[1]][[1]]
[1] 1

[[1]][[2]]
[1] "a"

[[1]][[3]]
[1] TRUE

[[1]][[4]]
[1] 1+4i


[[2]]
[[2]]$title
[1] "Numbers"

[[2]]$numbers
 [1]  1  2  3  4  5  6  7  8  9 10

[[2]]$data
[1] TRUE

There is no limit to how deeply nested such structures can be.

Because they are so flexible, lists are incredibly powerful, but can be a bit difficult to work with depending on how complex their structure is.

By combining the strictness of vectors with the flexibility of lists, soon we’ll see the workhorse of R, the data.frame.

Challenge 2

Make a list that contains:

  • Today’s date
  • A character vector of length two containing your name and your favourite colour
  • Another list containing the integer 5

Solution to Challenge 2

solution_list <- list(today(), c("My name", "Puce"),
                        list(5))
solution_list
[[1]]
[1] "2024-03-12"

[[2]]
[1] "My name" "Puce"   

[[3]]
[[3]][[1]]
[1] 5

Key Points

  • The basic data types in R are double, integer, complex, logical, and character.

  • Vectors are an ordered collection of data of the same type.

  • Create vectors with c().

  • Lists are an ordered collection of data that can be any type.