Research and background
The P7 bacteriophage belong to the order of Caudovirales, which contain a single linear double stranded DNA (dsDNA) and a have a tail. This order has three know families, Siphoviridae, Myoviridae and Podoviridae. The difference between these families is that they have different types of tails. The P7 bacteriophage belongs to the Myoviridae phages, which have a complex contractile tail. The mechanisms for DNA replication and packaging into procapsid can differ between different species of Caudovirales. By analyzing and determining the nature of the ends of the chromosomes it can be shed a light on the replication strategy of the bacteriophage.
Caudovirales have six know types of terminal ends. Phages use these different terminal ends to recognize their own DNA, rather than the DNA of their host’s. Most phages from this order package the DNA in a procapsid from concatemeric (repeating) DNA molecules that are frequently the result of rolling circular replication mechanisms. For P7 bacteriophages (that belong to the species of P1 bacteriophages) the mechanism of packaging is one that is called headful packaging, using a pac site. The pac site is where the terminase can initiate packaging. This leads to phages that have chromosomes that are terminally redundant and circularly permuted. An analysis of the terminals should confirm this.
After some research it seems there are two approaches of characterizing the termini of phages. The first one, that also was recommended by Professor Nilsson, is to use the software Geneious to look for regions of higher coverage. Since the terminal ends are repeats it is expected that this regions also have higher coverage. This should be combined with comparing the phage genome with a similar bacteriophage that has already been characterized, to be able to pinpoint the terminal repeats.
The second approach is to use the software PhageTerm. This software is freely available and uses the same principal as described above, by looking at regions of the data with a significantly higher number of reads compared to the rest of the genome. The advantage is that, unlike using Geneious which require experience to determine the terminals, PhageTerm uses a theoretical and statistical framework to determine the terminal repeats. Other advantages of PhageTerm are that it has been specifically investigated with Illumina technologies, tested with a range of de novo assembled bacteriophages and developed for dsDNA bacteriophages.
PhageTerm is developed by researchers at the Pasteur Institute and the institute also hosts PhageTerm on a Galaxy wrapper. This instance of PhageTerm was used to analyze the terminals of the genome assembled phage genome. The paired-end data and the assembled genome were given as inputs, with the default settings (seed length = 20, peak surrounding region = 20, limit coverage = 250). This resulted in a report [PDF] that put the starting position of the terminal repeats at 13344 and the ending position at 13592, which makes the terminal repeats 248 bps long. PhageTerm also classifies the ends as redundant and non permuting. If I understand the report correctly it identifies the genome as belonging to a T7 bacteriophage, but this needs to be discussed with professor Nilsson, since the information we were given was that genome should belong to a P7 bacteriophage. The difference between P1/p7 bacteriophages and T7 bacteriophages are that the chromosome ends of P1/P7 phages are permuted and the chromosome ends of T7 phages are not permuted.
PhageTerm also generates a file containing the phage genome sequence reorganized according to termini positions. It is unclear if we should proceed with this new reorganized file of the genome or continue with the genome that was assembled with SPAdes. This also needs to be discussed with professor Nilsson.