Build a phylogenetic

For this assignment, you will use an amino acid sequence from a specific protein found in a wide variety of plants. Proteins are composed of smaller molecules called amino acids. The amino acids are arranged in a specific sequence, which determines the shape and function of the protein. The sequence of amino acids in a particular protein can vary slightly among different organisms because of small mutations that have occurred over evolutionary time. The more evolutionary time has passed, the greater the number of mutations that could have occurred to change the amino acid sequence. If two organisms recently shared a common ancestor, then the amino acid sequences of their proteins should be similar. If two organisms last shared a common much farther in the past, then their amino acid sequences will have greater differences.
The protein you will use for this assignment is called Bet v 1. Proteins are often named after the organism where they were first found. In this case, the Bet v 1 protein was found in birch trees, which have the scientific name Betula verrucosa (hence Bet v 1 protein). The Bet v 1 protein helps plants defend against certain types of pathogens and is found in many different types of plants, including peaches, cherries, celery, tomato and potato and, of course, birch trees.
Here is a list of 15 plant species you will include in your analysis of Bet v 1.
celery
carrot
parsley
kiwi
cherry
peach
pear
strawberry
raspberry
apple
apricot
birch
soybean
potato
tomato
Compare these 15 types of plants to each other. Which are, in your opinion, more similar to each other and which are less similar? Why? Try to think about whether these similarities are required for function or if they are similar for another reason that isn’t apparent to you.
For example, you may think that apples and pears both have stems because the stem is required to hang that fruit. You may therefore conclude that stems are necessary for this particular function. You may also recognize that raspberries and strawberries both have their seeds on the outside but this similarity is not necessary for function because seeds on the inside work just as well.
Record your comparisons (at least three or four) on a piece of notebook paper. Write your name on the paper because you’ll turn this in later.
You have just developed a hypothesis about which structural similarities are homologies and which are not. You could use the homologies to build a phylogenetic tree by grouping together organisms with shared homologies. If you hypothesize that two organisms share a homology, then you are also hypothesizing that those two organisms share a common ancestor. Using homologies to develop phylogenetic trees is a very important tool in evolutionary biology.
Another tool is to use genetic information, such as DNA and proteins. DNA and proteins can vary among organisms due to mutations that change the nucleotide or amino acid sequence. Because genetic mutations accumulate over time, closely related organisms will be genetically similar. More distantly related organisms will be genetically less similar. You will use the amino acid sequence for Bet v 1 from the 15 plant species to obtain a phylogenetic tree. Below is the amino acid sequence for each species. Some sequences, such as raspberry, are a bit shorter but this will not be a problem.
>Celery
MGVQTHVLELTSSVSAEKIFQGFVIDVDTVLPKAAPGAYKSVEIKGDGGPGTLKIITLP
>Carrot
MGVQKHEQEITSSVPAEKMGHGLILDIDNILPKAAPGAYKNVEIKGDGGVGTIKHITLP
>Parsley
MGAVTTDVEVASSVPAQTIYKGFLLDMDNIIPKVLPQAIKSIEIISGDGGAGTIKKVTLG
>Kiwi
MGAITYDMEIPSSISAEKMFKAFVLDGDTIIPKALPHAITGVQTLEGDGGVGTIKLTTFG
>Cherry
MGVFTYESEFTSEIPPPRLFKAFVLDADNLVPKIAPQAIKHSEILEGDGGPGTIKKITFG
>Peach
MGVGTYESEFTSEIPPPRLFKAFVLDADNLVPKIAPQAIKHSEILEGDGGPGTIKKITFG
>Pear
MGLYTFENEFTSEIPPPRLFKAFVLDADNLIPKIAPQAIKHAEILEGNGGPGTIKKITFG
>Strawberry
MGVFTYESEFTSVIPPPKLFKAFVLDADNLIPKIAPQAVKSAEIIEGDGGVGTIKKIHLG
>Raspberry
YTSVIPPPKLFKAFVLDADNLIPKIAPQAVKSVEIIEGDGGVGTVKKIHLG
>Apple
MGVFNYETEFTSVIPPARLFNAFVLDADNLIPKIAPQAVKSAEILEGDGGVGTIKKINFG
>Apricot
MGVFTYETEFTSVIPPEKLFKAFILDADVLIPKVAPTAVKGTEILEGDGGVGTIKKVTFG
>Birch
MGVGNYETETTSVIPAARLFKAFILDGDNLFPKVAPQAISSVENIEGNGGPGTIKKISFP
>Soybean
MGVFTSESEHVSPVSAAKLYKAIVLDASNVPPKALPNFIKSVETIEGDGGPGTIKKLTLA
>Potato
MGVTSYTLETTTPVAPTRLFKALVVDSDNLIPKLMPQVKNIEAEGDGSIKKMTFV
>Tomato
MGVTTYTHEDTSTVSPNRLFKALVIDGDNLIPKLMPNVKNVETEGDGSIKKINFV
Each letter indicates 1 of 20 possible amino acids. For example, nearly all of the sequences begin with M, which represents Methionine. If you compare the first three amino acids of the sequences, most begin with MGV, MGA or MGL. The third amino acid varies among the different plants, which is due to mutational differences that have occurred over time. You might hypothesize that species with the V amino acid at the third position are more closely related to each other than they are to those with an A or L at the third position. Comparing all of the sequences manually to develop a phylogenetic tree would be a very difficult task. Fortunately, there are computer programs that do this for us.
Follow the instructions below to develop a phylogenetic tree for the 15 plant species using Bet v 1 amino acid sequences.
1. Go to the course website in Moodle. Scroll down to the link that says “Bet v 1 Protein Sequences.” Follow the link to find the Bet v 1 sequences.
2. Copy all 15 sequences from the website, starting at “>Celery” and ending on the line below “>Tomato”. You must include the lines beginning with the greater than sign and the sequences. Do not copy the start here and end here lines.
3. Navigate to the website http://phylogeny.lirmm.fr/phylo_cgi/index.cgi. There is a link on the course website. You’ll probably have to resize the new window that opens.
4. Scroll down a bit and click on “One click” under the heading “Phylogeny analysis”
5. Paste what you have previously copied into the giant white box in the center of the page, located directly under “Or paste it here”. You do not need to provide a name or your e-mail address.
6. Do not click any other boxes and click on submit. The entire process should take no more than about two minutes. The website is automatically doing two things for you:
a. Align the sequences. You probably noticed the sequences are not the same length. For example, the raspberry sequence is incomplete and therefore shorter than the other sequences. The sequences have to be properly aligned to get a reasonable analysis.
b. Analyze the aligned sequences. The computer uses a maximum likelihood analysis technique that is a mathematically complex but very robust. Many analyses take hours or days to run but our data set is very small so the analysis will be quick.
7. After the analysis is complete, you should now see a phylogenetic tree. Click on the PDF button under the heading “Download the tree”, which is right below the tree.
8. Print this PDF file and attach the scrap piece of paper that you jotted down the similarities on earlier.
The phylogenetic tree that you obtained is an real hypothesis about the evolutionary relationships among these 15 plant species, based on the Bet v 1 amino acid sequence. The numbers on the top of the tree branches indicate the likelihood that that particular branching is correct (it’s not a probability though; typically a value of 70 or higher is good).
Study the tree. Does it make sense to you? In other words, are plants that you thought might be closely related actually closely related?
On your scrap piece of paper that you attached to the PDF file, list some relationships that surprised you. If you can’t find any, list some that you predicted. But seriously, did you really predict kiwis are more closely related to celery and carrots than to strawberries?
Feel free to play around with the buttons and experiment with new types of views for your phylogeny. If you click on the “Alignment” tab above the tree, you’ll see the final alignment of your amino acid sequences. Notice the dashes that were added to the start of the raspberry sequence. The dashes indicate missing information. Notice that most of the amino acids vary but that some never change. For example, all of the plants have a “PK” near the middle of the sequence, and all have a K in the 6th to last position (the last green shaded column). These unvarying amino acids might be critical to the function of the Bet v 1 protein.

                                                                                                                                                               Order Now