I joined CSIRO over twenty years ago and have worked on many different projects mostly in wheat and barley. I love working in the laboratory combined with field and glasshouse work. My work involves varied aspects of seed biology, including seed dormancy, seed composition and environmental stresses. Before data school I would enter data either in FieldPrime or Excel and all my data analyses and visualisation was done in Excel. I had no experience writing code at all.
This project analyses the effect of high temperatures during the grainfill stages of wheat plants on the dormancy of seeds. Most genotypes grown in Australia have low dormancy, which can lead to pre-harvest sprouting (PHS) if there is rainfall near or at harvest time. For instance in 2010, economic losses due to PHS were estimated at $500 million in SA, NSW and QLD. High temperatures during the growing season can cause higher PHS. In this experiment I grew 230 different wheat varieties selected from the OzWheat panel and 10 landraces that were used in previous experiments. We have phenotypic and genetic data for all lines. Plants were grown under regular glasshouse conditions, scored for flowering date, and at a set developmental stage exposed to a hot spell of 7 days. At plant maturity (node collapse) heads were harvested and stored in the freezer to maintain dormancy levels. Seeds were scanned to measure different parameters of seed size and germination tests were performed for each genotype. We want to identify genotypes that are highly sensitive and highly insensitive to heat treatment and determine the genetic basis of heat response on dormancy to future proof Australian wheat varieties.
glasshouse | block | line | line_id | pot_num | germination | trt | area_mm2 | majellipse_mm | minellipse_mm | perimeter_mm |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | GLUYAS_EARLY | 269 | 1 | 0.0 | control | 21.7 | 7.8 | 3.5 | 24.2 |
1 | 1 | LandRace4 | 9532 | 2 | 94.1 | control | 15.1 | 6.2 | 2.9 | 22.2 |
1 | 1 | RANEE | 11 | 3 | 40.0 | control | 20.3 | 7.3 | 3.5 | 23.0 |
1 | 1 | NABAWA | 7 | 4 | 35.0 | control | 23.1 | 7.8 | 3.8 | 24.4 |
1 | 1 | MOLINEUX | 83 | 5 | 11.1 | control | 18.6 | 6.9 | 3.4 | 22.0 |
I used Tidyverse to combine data from many seperate data files into 1 ‘tidy’ tibble! I spent a lot of time with ggplot2 looking at the data, finding outliers, trends etc. All of R was totally new to me.
I really like the ggplot package, looking at data in many different ways gives a much better insight into what might be going on. What a difference from bar graphs in Excel. Also of course the tidyverse package, what would we do without it?
Most of my time went into cleaning and tidying up data, combining data from many files into one file that I and others can succesfully use now and in the future. Many challenges had to be overcome along the way. With no coding experience at all a lot of time went into trying to remember particular terminology.
Next I want to link the results from the heat experiment to all the genetic data we have for the OzWheat lines. I will need to further my knowledge of the particular packages neccesary to achieve that.
Taking time to do Data school is the best thing that happened to me for a while, especially with the timing of it all. Other than giving me a start in learning to code and using R, it has been wonderfull to work on improving data skills whilst having to work from home in Covid-19 times. I really enjoyed learning to use ggplot and seeing all the different options of visualisation of the data. I am very impressed with people in the Data School Community, their willingness to share knowledge, take time for new students to learn and support them. I am looking forward to using new skills in my day to day work.