Discussion about future direction for this project

Since there is a time constraint on this project and the deadline is approaching, there is no more time for any more, or any deeper analysis of the assembled genome of the ETEC p7 bacteriophage. This post will summarize the additional analysis and recommendations of approach for the further characterization of this genome.

The genes that were found through Glimmer and characterized through homology search can be further characterized by using for example BLASTP to compare the protein sequences with the protein sequences of other phage to get a sense of similarities and novelty of the proteins between ETEC p7 and other phage species.

Further characterization of the genome is also needed. We made a preliminary prediction of the position of the ORFs and genes. But Glimmer and ORFfinder made somewhat different predictions about the positions of each gene and it remains to investigate the exact positions of every gene and ORF. One way of doing this would be to find the promoters, Shine-Dalgarno sequences and transcriptional terminators. Finding these elements should make it possible to assess if the positions of the genes are correct or if they need to be adjusted.

We made an attempt of searching for promoters in the genome. But using three different softwares for this gave three completely different results. It turned out searching for promoters is difficult and can be time consuming. Promoters of viruses are either the same as the host promoters or very closely related (so that the RNA polymerase of the host will bind with the promoters of the virus). But the promoters can also be specific to the virus. The approach to finding the promoters are thus to find the sequences of the promoters of the bacterial host. If the exact sequence of the promoter is not found mismatches can be allowed. If the promoters are specific to the virus it will become very difficult to find the promoters, but one approach is to look for UTR regions of the genome.

We also made an attempt of finding the transcriptional terminators, but the number of promoters was not even close of matching the number of terminators. Thus, a lot more time and effort is needed into elucidating promoter, Shine-Dalgarno and terminator sequences of this genome.

The larger intergenic gaps should be further investigated for ORFs that might have been missed by Glimmer. For example by homology searches  by BLASTX or searching in databases over unfinished microbial genomes. There is a large gap from about 16300 to about 17600 that could potentially hold more ORFs.

As ETEC p7 has a genome consisting of double stranded DNA it belongs to the order of Caudovirales, but we have not been able to gain any definitive information about which family it belongs to. Our best guess at the moment is that it belongs to the family of podoviridae, since some of the apparently closests relatives of ETEC p7, like SU10 and phiEco32 are podoviridae. For the same reason we suspect that ETEC p7 has C3 morphology. But considering the fast evolution of bacteriophages and their ability to acquire DNA horizontally from both other phages and from their hosts, genomes of phages are mosaics and it is nog possible to just rely on close relationships according to homology searches. To be able to get a definitive answer studies of the structural proteins of the virion need to be conducted with different types of electron microscopes, so that visual assessments can be made. Furthermore, predicting secondary structures of the scaffolding proteins can also give clues to the morphology of the bacteriophages, as described in the paper by Mirzaei et al. Predicting secondary structure of protein sequences can be done with for example PSIPRED and JPred.

And lastly, a phylogenetic analyses needs to be conducted. For this it is necessary to have knowledge what features of the phages that scientist use to make the phylogenetic trees of phages. With very basic knowledge about this it seems that the most important features are scaffolding proteins and head proteins that has to be considered. This means that a study needs to be conducted where these structural proteins of ETEC p7 are compared to the same structural proteins of other bacteriophages.

A clarification about the bacteriophage species

There has been some confusion about the species of the bacteriophage genome that we are working on that needs to be clarified. The genome we were assigned was supposed to belong to a Enterotoxigenic Escherichia coli (ETEC) p7 bacteriophage. This made me believe that this phage was a bacteriophage P7, which is a myoviridae.

When BLASTing the genome that was assembled against the NCBI database the results showed that the Phage vB_EcoP_SU10 (SU10) had the closets identity (90%) with our assigned phage genome. The SU10 is closely related to podoviridae (according to professor Nilsson). Also PhageTerm classified the genome as belonging to a bacteriophage T7, which is a podoviridae. This was the source of the confusion since I believed our assigned phage genome should belong to a myoviridae, but all the analysis showed that it was closer to, or even belonged to a podoviridae.

After clarification from professor Nilsson it turns out a ETEC p7 is not the same phage as a bacteriophage P7. ETEC p7 has a distant relationship to podovirdae (according to professor Nilsson). This should explain why the softwares placed ETEC p7 in relationship with podovirdae. So it seems we are on the right track.

Next up is to try and predict coding sequences of the ETEC p7 genome with Glimmer.