This lesson is being piloted (Beta version)

Exercise 2 - Data queries

Try to answer each of the following by writing a single SELECT statement per question.

1. Can you list all details about all samples?

2. Can you list just the name and species for all samples?

3. Can you list details of the first 10 sequences?

4. How many loaded sequences are there?

5. How many sequences are of the type “genomeAssembly”?

6. How many loaded sequences are there of each different type?

7. How many sequences have a length greater than 1000?

8. How many sequences have a length greater than 1000 and are from sample “D36-s2-tr”?

9. What is the average length of all sequences?

10. What is the average length of sequences of type “genomeAssembly”?

11. What is the average length of sequences of type “genomeAssembly” AND what is the average length of sequences of type “protein”?

12. How many annotations (alignedannot.annotation) contain the phrase “hypothetical”?

13. Can you list details of the sequence named “D18-gDNA-s1638”, replacing the foreign keys with sensible info (e.g. replace ‘isSample’ id with actual sample name)?

14. Does the sequence named “D18-gDNA-s1638” have any other sequences that align onto it (it’ll appear in seqRelation.parentSeq)? List the name(s) of any such sequence(s).

Hint for Q14: You’ll need to make use of the ‘AS’ keyword