I have a background in taxonomic identification of marine invertebrates and plankton. This requires working with data sets of species data and associated environmental data, spatial details and imagery. I have been involved in fieldwork using SCUBA diving to survey for introduced marine pests in Australian ports and to conduct underwater visual counts for species monitoring. I have used Excel, Access and SQL databases to manage data and have some experience in coding using Oracle to manage data and and R to produce graphs.
The spotted handfish, Brachionichthys hirsutus, is a small marine fish that “walks on its hands” on the bottom sediments rather than swimming. Its distribution has been heavily impacted on by human activities such as scallop dredging and also by predation by an introduced seastar Amurensis australis. It is now restricted to a small area in southern Tasmania. The project data presented here examines the changes in the distribution of the handfish since monitoring of the population began at CSIRO in the late 1990’s.
Initially, from 1996- 2009 divers swam 100 metre underwater transects and counted the number of fish seen. This historic dataset was entered into Excel More recently, divers have towed a gps float and marked the position of each fish while swimming a variable length transect. This recent dataset was entered into Access.
Using the skills I have learnt in dataschool I have been able to “tidy” the 2 datasets using the tidyverse so they could be joined together. To do this I used rename to remove the spaces from my column names and to make the headings consistent between the 2 data files; mutate to change the count column from characters to numbers to enable statistical analyses; filter to remove those unnecessary data rows; and select to remove the unnecessary data columns. I also used group_by, arrange and summarise to rearrange my tidy datasets
I learnt to use regular expression (regex) coding in R to use str_replace to edit my location names between the 2 datasets where there were spelling differences in the data tables. See the example code below:
#the correct name for this location is Mary-Ann Bay, it is mis-spelt in the historic dataset
TidyHistoric <- TidyHistoric %>% mutate(Location = str_replace(Location,'Mary-Anne Bay', "Mary-Ann Bay"))
Another handy thing I learnt was how to add in a index column of row numbers, the sample code below was used to produce Table 1
#add in a row number column
index_numbers <- 1:nrow(Historic_bytransect) # to count no. of rows in data frame
Historic_bytransect <- Historic_bytransect %>%
ungroup %>%
arrange(Sample_date) %>%
mutate(Row_ID = index_numbers)
Location | Loc_abbr | Sample_date | Transect_no | Swath_Area | Total_fish | Row_ID |
---|---|---|---|---|---|---|
Opossum Bay | Opos | 1997-05-01 | A122 | 124 | 1 | 1 |
Opossum Bay | Opos | 1997-05-01 | A244 | 158 | 1 | 2 |
Opossum Bay | Opos | 1997-05-01 | A277 | 160 | 1 | 3 |
Opossum Bay | Opos | 1997-05-01 | A40 | 70 | 1 | 4 |
Opossum Bay | Opos | 1997-05-01 | A92 | 96 | 1 | 5 |
Hopefuly this will enable me to conduct some time-series analysis using GLMs.
Plots from R
Here is a ggplot showing the density of fish using facet_wrap to show the separate sampling sites
Learning to use the tidyverse has been a great help to my programming in R with all those new functions that it makes available.
GGplot and the add-ons GGanimate and the integration in Plotly are going to be useful in the future once I have time to play with them
tidying and merging datasets compiled in several different formats over the decades.
I am keen to spend more time investigating the concepts and techniques I have learnt in Data School on my projects going forward.
Having the experience of working through examples in Data School in class, in small groups and as “homework” has helped me consolidate the techniques and made it easier to remember how to do things, but also I now know where to go for help from a myriad of sources and links provided during the course.