Homology search algorithm



Principle of identification method to a phylogenetic group :
The program compares the query nucleotide sequence(s) from panC gene to sequences in a database, in multiple steps, and calculates the % homology and the statistical significance of matches with all database sequences. The phylogenetic group containing the sequence for which the homology is the highest is selected.

Requirements :
• The sequences AACAAAC or AACAGAC are needed to initiate alignments, so that one of theses two initial sequences must be present on the query sequence(s). The program then extends alignment to generate the final alignment.
• Length of query sequence(s) must be over than 250 bases.

Mathematic formula for % homology calculations and significance of matches :
The program calculates the percentage of homology, from initial sequence (AACAAAC or AACAGAC) to the end of the alignment (end of the database sequence or end of the query sequence if it is shorter than the database sequence).

* number of bases covering the alignment

When the highest % homology is < 90%, identification to phylogenetic groups is considered as not significant and rejected.
When the highest % homology is <75%, identification to the B. cereus Group is considered as not significant and rejected.