Introduce Myself. I am a virologist, and I am a postdoc in Australian Animal Health Laboratory (AAHL), CSIRO. My research projectS in AAHL are focusing on the identification of potential targets for anti-influenza drugs and therapeutics. By doing so, we will use CRISPR/Cas9 gene screening to find candidates that will inhibit virus replication. Therefore, I need to deal with big datasets when the screening results come back from next generation sequencing, which drives me to sign up for the data school training, and I believe this training will enable me to better analyse and visualise big datasets.
The data I used for My data school focus training has been published a few years ago, so no confidentiality issues will be involved. This dataset is from a project with microRNA raw counts with hosres infeted or mock infected with hendra virus. The information of this published data can be found here: info of data
The project I used for Data School training contains 2 raw data. One is metadata about horse information. The other csv data includes raw counts of 889 microRNAs from mock infected and hendra virus infected field horses.
Tables
horse_id | condition |
---|---|
CD1 | mock |
CD2 | mock |
CD3 | mock |
CDA | infected |
CDC | infected |
CDD | infected |
gene | CD1 | CD2 | CD3 | CDA | CDC | CDD |
---|---|---|---|---|---|---|
eca-miR-486-5p | 8720190 | 6915893 | 5762274 | 4519245 | 5445002 | 6055352 |
eca-miR-451 | 144928 | 358127 | 1302088 | 835465 | 1970831 | 2014491 |
eca-miR-22 | 60372 | 33604 | 140474 | 71381 | 112475 | 128153 |
eca-miR-191 | 51851 | 87233 | 98228 | 105711 | 120170 | 152413 |
eca-miR-423-5p | 47771 | 24544 | 35254 | 31048 | 42440 | 34589 |
eca-miR-142-5p | 42237 | 51289 | 111810 | 91740 | 110281 | 107092 |
gene | horse_id | counts | infection |
---|---|---|---|
eca-miR-486-5p | CD1 | 8720190 | mock |
eca-miR-451 | CD1 | 144928 | mock |
eca-miR-22 | CD1 | 60372 | mock |
eca-miR-191 | CD1 | 51851 | mock |
eca-miR-423-5p | CD1 | 47771 | mock |
eca-miR-142-5p | CD1 | 42237 | mock |
Raw counts of microRNAs of mock and hendra infected horses
Figure 1: Overview of microRNA profile of horses
MicroRNA profile with more than 10,000 counts"
Figure 2: MicroRNA profile > 10000 counts
PCA analysis"
Figure 3: MicroRNA_PCA
I have been using tidyverse and gglot2 to tidy up and visualize my data. Besides, I also try to do some statistical analysis with the dataset, such as student t test and PCA analysis.
My favorite tools are tidyverse and ggplot2. I can tidy up my data with tidyverse and then visualize my data with ggplot2.
I spent quite a long time struggling with AAHL computer, as I need to ask for administration right every time I need to install any packages or liabrary. What is more frustrating is that I need to install packages and libraries every time I open RStudio. Therefore, I just gave up using AAHL computer and joined the data school trainning at home if I can.
When I was trying to analysis my data, I spent a lot of time in tidy up and try to graph in different ways. I would like to invest more time on how to deal With big dataset such as microRNA sequencing results and next generation sequencing results, how to graph them in a more resonable way, and evetually perform scientific statistical analysis.
My next step would be learning how to analysis RNA seq results following some pipelines, generating graphs that make sense, and then analysis the data in a more scientific way,
I was hesitated to signed up for the Data School Training at the beginning since it requires a lot of time and commitment. But now I feel lucky that I actually joined the data school. The friendly atmosphere, professional Kerensa & Stephen, helpful mentors & helpers, lovely colleagues are essential for the success of this training. I enjoyed a lot!
Now I have basic idea about R and RStudio, such as how to tidy up my data and visualise my data with ggplot, but I still need to invest more time and effort in R to become a little bit more professional! Hopefully, I will use all the skills I gained from data school into my future research projects!
I totally recommend this Data School Training to everyone, you will get more than what you expect!