Introduction

Hi, I’m Yuzi, a phD student in the cereal quality group of Agriculture & food. Without any programming experience before, the data school is sort of a whole new world for me, and it turned out to be very intereting and I’m keen to learn more.

My Project

The project is about modelling the degradation of wheat starch. Hundreds of starches from the MAGIC (Multiparent advanced generation intercross) population are being used for measuring the degradability and also some other structural properties. The end goal is to built a model that predict the degradability of wheat starch from the structural properties.

Preliminary results

The current dataset is a combination of my experimental results (hydrolysis) on morn than 200 wheat starch with some previous results from other people regarding to the structural properties. The hydrolysis assay was done in microplates, it’s an enzymatic reaction over 30 hours during which 9 times of sampling was done.

Tables

Table 1: Structural and functional properties of wheat starch
Sample	ID	Time	Hydro_extent	Amylose_content	D1	D5	D9	mean_Peak	mean_Trough	mean_Final	low_dp	medium_dp	high_dp
129	cav4081295	0	2.783424	26.72911	2.1685	6.753	30.1415	237.585	119.42	242.54	30.33146	50.17443	5.579565
129	cav4081295	20	4.465143	26.72911	2.1685	6.753	30.1415	237.585	119.42	242.54	30.33146	50.17443	5.579565
129	cav4081295	60	14.453536	26.72911	2.1685	6.753	30.1415	237.585	119.42	242.54	30.33146	50.17443	5.579565
129	cav4081295	120	23.881355	26.72911	2.1685	6.753	30.1415	237.585	119.42	242.54	30.33146	50.17443	5.579565
129	cav4081295	180	30.404386	26.72911	2.1685	6.753	30.1415	237.585	119.42	242.54	30.33146	50.17443	5.579565

My Digital Toolbox

Tidyverse (dplyr, ggplot2…)
Knitr

Favourite tool

ggplot2
can’t wait to learn Shiny

Figure 1: Spatial variability across the plates

The heatmap is to explore the spatial variability across the plates at different time points, and also to find potential outliers. For example the figure above shows the hydrolysis extent of the first six plates at 360 minutes. The white blocks are the empty samples, missing values and very few outliers. As we can see here, the color are randomly distributed, no patterns can be found, which is good. The plate 3 and 6 tend to have higher intensity than the others, whether it’s due to the variation of the experimental conditions (temperature, enzymatic activity…) or the difference between samples need to be checked later on.

My time went …

tidying the data (the most time-consuming part)
checking the spatial variability across the plates (heatmap)
plotting the experimental results (fig.1)
curve fitting
exporting all the estimated parameters from the model
plotting the fitted values (fig.2)

Figure 2: Experimental results of the starch degradability

Figure 3: Fitted results of the starch degradability

Next steps

Looking for the potential outliers
Try other mathematical equations to do the curve fitting, compare the goodness of fitting among them
Establishing a predictive model using the Partial least squares (PLS) regression

My Data School Experience

It’s an awesome learning experience, the nice pace made it easy to follow. In the past, I thought I would never understand anything about programming, but I finally did it now thanks to the data school. I’ve gained lots of knowledge and skills regarding the data visualization, data analysis, statistics as well as data management which I’ve already applied to my daily work, and it’s always exciting to learn and explore more R codes that help us solve various problems.

Modelling the degradability of wheat starch

Yuzi Wang

CSIRO Agriculture & food