Preliminary investigations into the population genetics of sea spurge (Euphorbia paralias) in Australia

Gavin Hunter

CSIRO Health and Biosecurity

Introduction

I am a research scientist in the Temperate Weeds research group of the Managing Invasive Species and Diseases (MISD) research program of Health and Biosecurity. I am a plant pathologist and use classical and DNA-based molecular techniques to investigate the interaction of fungal plant pathogens and their plant hosts. My daily work pattern involves experiments undertaken in the laboratory, glasshouse and quarantine facilities. I had no previous programming skills before Data School but was always very interested in learning programming to broaden my analytical skill set.

My Project

Sea spurge (Euphorbia paralias), a native of Mediterranean Europe, is an invasive plant in Australian dune and foredune ecosystems. The weed forms dense infestations that reduce the amenity of public beaches, out competes native dune-associated plants and can also restrict the nesting habits of Australian birds. Colleagues and collaborators have collected samples of sea spurge from 48 populations across its distribution along the southern coastline of Australia. Our aim is to understand the genetic diversity of sea spurge in Australia and to elucidate its invasion pathways across the southern States (WA, SA, VIC, TAS, NSW). To do this, approximately 751 Single Nucleotide Polymorphism (SNP) markers were identified for 364 samples across the 5 States. These data was used to determine various population genetic indices including Observed Heterozygosity (Ho), Gene Diversity (Hs), Inbreeding coefficients (Fis), Allelic richness (Ar) and the number of private alleles (Pa).

Preliminary results

Preliminary analyses of the 751 SNP loci across the 364 sea spurge samples indicates that low to moderate heterozygosity and gene diversity is present in the 48 sampled populations across Australia (Figure 1). Interestingly, mean heterozygosity for populations from New South Wales was lower than populations from all other States and it appears that sea spurge populations in Western Australia and South Australia have slightly larger mean heterozygosity values (Figure 2). This is not entirely surprising as it has previously been hypothesized that WA and SA were the States where sea spurge was initially introduced in the early 20th century. Further evidence of this, particularly for South Australia as a initial introduction site, is the presence of a very large number of private alleles in sea spurge populations in South Australia (Figure 3). What makes this value more astonishing is the fact that only seven individuals from one population in South Australia account for this exceedingly high number of private alleles.

Table 1: Basic population diversity indices for sea spurge populations collected across five Australian States
pop_no state bioregion N Ho Hs Ar Fis Pa
13 SA Eyre Yorke Block 10 0.1557035 0.2748483 1.268575 0.2590759 0
1 NSW South East Corner 3 0.0372807 0.2585522 1.213937 0.3297827 0
33 VIC South East Coastal Plain 6 0.1149663 0.2620506 1.248668 0.3251583 0
2 NSW Sydney Basin 6 0.0082116 0.0048827 1.005185 -0.0066578 0
34 VIC South East Corner 10 0.0328450 0.2163831 1.206701 0.4427905 0
3 NSW Sydney Basin 9 0.0820096 0.1638298 1.158951 0.1703337 0
23 TAS Tasmanian Northern Slopes 2 0.1071904 0.1071904 1.107190 -0.0039947 0
24 TAS Tasmanian West 9 0.1204634 0.1992534 1.194604 0.2199467 0
41 WA Esperance Plains 10 0.1907071 0.2816362 1.276183 0.1977686 0
4 NSW Sydney Basin 3 0.0088766 0.2545614 1.205415 0.3679539 0
1 N: Number of samples
2 Ho: Observed Heterozygosity
3 Hs: Observed Gene diversity
4 Ar: Allelic Richness
5 Fis: Inbreeding coefficient
6 Pa: Number of Private alleles
Figs.1-4: Plots of Observed heterozygosities, Gene diversity, private alleles and inbreeding coefficients for populations of sea spurge based on biallelic SNP data.

Figure 1: Figs.1-4: Plots of Observed heterozygosities, Gene diversity, private alleles and inbreeding coefficients for populations of sea spurge based on biallelic SNP data.

My Digital Toolbox

I have been using several R packages, together with tidyverse, to read-in, manipulate, analyse and graph genetic data in this project. Some of these R packages include;

I don’t expect everything to be completed within R as certain analyses can only be undertaken in other standalone software packages. However, the fundamental skills learned through Data School have already helped me to increase the amount of analyses that I can undertake in R.

Favourite tool (optional)

Since working in R I have particularly enjoyed two packages namely ggplot2 and adegenet. This is due to their versatility and the great number of plots that can be generated, in ggplot, and the useful, popgen specific, analyses that can be carried out in adegenet.

My time went …

The greatest amount of time that I spend on my project goes to cleaning, filtering and re-formatting data so that they can be used for other R packages. As I am relatively new to R programming, I have also found that I spend some time reading package manuals, working through their vignettes and examples to better understand the operation of the package.

Next steps

My next steps are to continue learning the tidyverse and other R packages associated with population genetics and genomics to apply them even more to my typical analysis workflows.

My Data School Experience

I had a great time in Data School! I found that the instructors were fantastic at teaching and clearly explaining the course content. They were also very encouraging and inclusive, making all the participants feel comfortable and confident to participate, debate and learn. I am confident that the skills and techniques I have learnt in Data School will have a very positive impact on my research and open other data analysis pathways.