Analysis of solutions for the governance and prevention of marine debris

Theresa Stoll

Oceans & Atmosphere

Introduction

As a PhD student affiliated with CSIRO’s Oceans and Atmosphere Business Unit, I conduct research on marine debris, more specifically on solutions that can be utilised to prevent marine debris and improve the governance of this problem. Having just recently started my PhD candidature and having never coded before, the participation in the Data School FOCUS is an incredible opportunity for me. The skillsets taught during the Data School FOCUS are directly relevant for my PhD project.

My Project

As part of my PhD research, I will use R to assess different data provided by the nonprofit advocacy group Ocean Conservancy to examine the effectiveness of different policies and practices to reduce marine debris entering the oceans. The datasets that I use throughout the Data School FOCUS consists of about 50,000 entries in different excel files. Each entry is a marine debris survey undertaken by volunteers who submitted their data to Ocean Conservancy (from the years 2011 until 2018). I had not yet done any analysis with the data prior to my participation in Data School FOCUS because knowledge of ‘R’ is essential for this analysis. That is why the participation in the Data School FOCUS is of immense benefit for my research.

Preliminary results

I spent the majority of the time during the Data School FOCUS on tidying and manipulating my data. Due to the size and untidy nature of the data, tidying it was a mammoth effort! Over the past few weeks, I managed to tidy the data for the years 2011 until 2015 and combined these different datasets. I will continue to work on my data and aim to tidy the datasets for the remaining years sometime soon so that I can eventually combine them all. As I mainly focused on data tidying, wrangling and transformation during the past few weeks, I did not have a lot of time for conducting statistical analyses and creating plots with ggplot. The below is a plot that shows how many surveys were collected in which continent for the different years from 2011 until 2016. The below table is a subset of the combined dataset 2011-2015 that was used to create the plot. Unfortunately, working with the entire dataset 2011-2015 was very difficult as the file is so big that I am for example unable to save it as a csv file in R.

Tables

Table 1: Tidy data, selected columns for plot below
X Country year continent
1 United States 2011 Americas
2 United States 2011 Americas
3 United States 2011 Americas
4 United States 2011 Americas
5 United States 2011 Americas

Plots from R

Frequency of surveys in different continents

Figure 1: Frequency of surveys in different continents

My Digital Toolbox

As I had not yet had the chance to formally learn about R, acquiring skills in all the different R software packages introduced as part of the Data School FOCUS was an unique and well-timed opportunity for me.

My time went …

The tidying of my data took much longer time and effort than expected. Applying the skills I learned in Data School FOCUS on best-practice data analysis, visualisation and management will help me to save time and manage my data more effectively in the future.

Next steps

Over the next few weeks, I will solidify my knowledge of R and will continue to apply it to my own data. I would be interested in learning more about other R packages such as gganimate.

My Data School Experience

The possibility to establish fundamental programming and data analysis skills through the use of R is immensely beneficial for my PhD research and professional development. Having just recently started my PhD candidature and having had no experience in programming or using R, the timing of the Data School FOCUS could not have been better. I started exploring R before the Data School FOCUS. My participation in the Data School FOCUS helped me to better understand apply ‘R’ and the associated R software packages which will help me to address my research questions. Following my participation in Data School FOCUS, I will apply the skills I learned for my data analysis which will take my PhD research to the next level. These skills will enable me to kick-start my PhD research and the associated analytical skills that I need to succeed as a research scientist.