Wednesday, March 28, 2012

Distance Types

Distance between a given pair of sequences is calculated depending on the alignment resulting from running an algorithm like Smith-Waterman, Needleman-Wunsch, or Blast. In general the alignment of two sequences may appear as shown below.

Each square represents a character and dashes indicate gaps. Characters and gaps existing outside the aligned region is possible only with a global alignment algorithm like Needleman-Wunsch. Local alignments resulting from both Smith-Waterman and Blast will have aligned region being identical to the entire alignment. Also, note that with a local alignment the starting pair and ending pair of characters from the two sequences will not include a gap character. 

If the two aligned sequences are S1 and S2 then aligned region is defined from StartIndex to EndIndex inclusively defined as below.

StartIndex = Max (S1.FirstNonGapIndex, S2.FirstNonGapIndex)
EndIndex = Min(S1.LastNonGapIndex, S2.LastNonGapIndex)

The length of the aligned region, AlignedLenth = EndIndex - StartIndex + 1

  • Percent Identity (PID) Distance
    • Let NumOfIdenticalPairs be the number of identical pairs within the aligned region. For example in the above picture there are five such pairs (2 greens, 1 purple, 1 blue, 1 red)
    • PID = NumOfIdenticalPairs / AlignedLength
    • Convert percent identity as a distance by taking 1 - PID
  • Score
    • Each pair of aligned characters is assigned a score using the substitution matrix and gap penalties. 
    • The summation of all such scores is called the score of the alignment. The alignment algorithm always tries to maximize this value.
    • See [1] for more details.
  • Normalized Score
    • Compute the score for aligning S1 with S2, S1 with S1, and S2 with S2. Let these values be named as S1S2, S1S1, and S2S2 respectively.
    • NormalizedScore = 2 * S1S2 / (S1S1 + S2S2) 
  • BitScore
    • Blast alignment has a value called BitScore, which is a log scaled version of the Score.
    • See [1,2] for more details. 
  • Normalized BitScore
    • Similar to NormalizedScore, compute BitScore for aligning S1 with S2S1 with S1, and S2 with S2.
    • Used the same formula as in NormalizedScore to compute NormalizedBitScore
References:

No comments:

Post a Comment