I am an oceanographer with experience in biotechnology, mass spectrometry and aquaculture. Before Data School I did not code at all and my data processing consisted of using vendor specific software (expensive licenses) to visualise and process data. I also did lots of clicking and data sorting in Excel. It was not enjoyable.
The goal of my project(s) is to identify peptide markers derived from proteins that are expressed in different prawn tissues as a result of using different functional diets. Proteomics is used as a main tool. The long term goal is to achieve sustainability in the aquaculture sector throughout the use of diets manufactured from renewable sources.
In this study, 252 protein peptides were measured in prawn hepatopancreas two hours post-feeding. We used Sequential Window Acquisition of all Theoretical mass spectra (SWATH-MS) to detect and identify these protein peptides.
Tables
diet | replicate | peptide | concentration |
---|---|---|---|
Fasting | 1 | ADSFDPEANLSHYSDGGK_G1AP69 | 2098.75 |
Fishmeal | 1 | ADSFDPEANLSHYSDGGK_G1AP69 | 591538.29 |
Krillmeal | 1 | ADSFDPEANLSHYSDGGK_G1AP69 | 445057.79 |
Novacq | 1 | ADSFDPEANLSHYSDGGK_G1AP69 | 573059.70 |
Figure 1: Protein expression in hepatopancreas of prawns fed different diets
Welch Two Sample t-test
data: mean_Concentration by Diet t = -0.68083, df = 114.48, p-value = 0.4974 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1228733.0 600152.9 sample estimates: mean in group Fishmeal mean in group Novacq 1045628 1359918
Welch Two Sample t-test
data: mean_Concentration by Diet t = -1.4118, df = 106.67, p-value = 0.1609 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1536256.4 258288.2 sample estimates: mean in group Krillmeal mean in group Novacq 720934 1359918
Welch Two Sample t-test
data: mean_Concentration by Diet t = 0.88402, df = 106.77, p-value = 0.3787 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1749426 4565366 sample estimates: mean in group Fasting mean in group Novacq 2767888 1359918
The tool that I have used the most in my projects is Tidyverse. I currenlty use it to generate lists of proteins identified using proteomics and to filter protein redundancy. I am also focusing on learning more about ggplot2
I spent a significant amount of time understanding the underlying logic of the process of making dataframes “tidy”. I was not surprised that Data School would be a challenge but at the same time I was very excited to start learning a coding language. I solved some of my challenges by rewatching the webex recordings and reading at forums in the internet.
I will focus in mastering R for now but I am sure that I am not going back to excel. I believe that in the future I would like to become a bioinformatician/statistician to complement my current skills. I would like to learn bash, SQL, SAS and python.
I wish I had attended week 1 at data school to personally meet everyone. Aside from that my experience in data school has been rich. I have been able to use my newly acquired data school skills in my daily work. A specific example of this has been producing an output (a protein list) that tells me the degree of protein redundancy that I have in my proteomics work. This was a task that would take me between 3-4 hours in excel. Now it takes me 45 seconds. Very embarrasing. Never again. At a personal level, being able to be more efficient with my time makes me feel more at ease and happy.