PS_LG_ALIGN - Profile/Sequence Local Gap Alignment
PS_LG_ALIGN is a draft implementation of some of the string
alignment algorithms described in the reference [Chao]. These
algorithms carry out the computation in linear space, and compute
not just the optimal alignment score, but also the corresponding optimal
alignment. The alignments considered are local, that is, only
contiguous portion of the sequence is mapped to a portion of the profile.
Gaps in the alignment are assigned an affine gap penalty.
It's important to be able to compute alignments using "linear space",
that is, just a few vectors whose length N is equal to that
of a typical string. A quadratic algorithm would require a two
dimensional array of total dimension N*N. Realistic alignment
problems can involve strings of N=100,000 elements or more,
so a quadratic algorithm would be expensive or impossible to use.
The optimal alignment score is computed without explicitly constructing
the corresponding alignment. So a major feature of the algorithms is
how to backtrack from the score to retrieve the alignment. It is a matter
of some difficulty to recover the matching, particularly if the best
score was calculated with a linear space algorithm, which discards a
great deal of intermediate information. However, the linear space
algorithm implemented here can also compute the optimal matching, based
on the idea of a recursive subdivision of the problem.
This set of algorithms does not actually match a pair of sequences,
but rather matches a sequence to a "profile". A profile is constructed
based on information from many sequences, and can be thought of as
a "generalized sequence", or a set of indices, where for each index
we specify the likelihood that each possible nucleic acid will occur.
These likelihoods can then be used to score the alignments we consider
with the new candidate sequence.
This set of routines assumes that an insertion or deletion of length
K is penalized using an "affine gap penalty formula" of the form:
Penalty = Gap_Open + K * Gap_Extend
This choice of penalty function has a major effect on the form
of the matching algorithms, particularly in the linear space case.
For the profile problems covered by these algorithms, the gap penalties
are further adjusted using profile-position percentages specified by
the user.
Routines that use quadratic space are included as well, so the algorithms
can be compared for storage, speed, and correctness.
-
Reference 1:
-
Kun-Mao Chao, Ross Hardison, Webb Miller,
Recent Developments in Linear-Space Alignment Methods: A Survey,
Journal of Computational Biology,
Volume 1, Number 4, 1994, pages 271-291.
-
Reference 2:
-
Eugene Myers and Webb Miller,
Optimal Alignments in Linear Space,
CABIOS, volume 4, number 1, 1988, pages 11-17.
-
Reference 3:
-
Michael Waterman,
Introduction to Computational Biology,
Chapman and Hall, 1995.
Files you may copy include:
The list of routines includes:
-
A_INDEX sets up a reverse index for the amino acid codes.
-
A_TO_I returns the index of an alphabetic character.
-
C_CAP capitalizes a single character.
-
I_SWAP switches two integer values.
-
I_TO_A returns the I-th alphabetic character.
-
IVEC_REVERSE reverses the elements of an integer vector.
-
IVEC2_COMPARE compares pairs of integers stored in two vectors.
-
IVEC2_PRINT prints a pair of integer vectors.
-
IVEC2_SORT_A ascending sorts a vector of pairs of integers.
-
PROFILE_SCORE_PRINT prints profile scoring data.
-
PROFILE_SCORE_READ reads profile scoring data from a file.
-
PROFILE_SCORE_READ2 returns a small amount of information from a profile.
-
PS_GG_BSL determines a global gap backward alignment score in linear space.
-
PS_GG_FSL determines a global gap forward alignment score in linear space.
-
PS_LG_BPQ determines a local gap backward alignment path in quadratic space.
-
PS_LG_BSL determines a local gap backward alignment score in linear space.
-
PS_LG_BSQ determines a local gap backward alignment score in quadratic space.
-
PS_LG_CORNERS determines the "corners" of an optimal local alignment.
-
PS_LG_FPQ determines a local gap forward alignment path in quadratic space.
-
PS_LG_FSL determines a local gap forward alignment score in linear space.
-
PS_LG_FSQ determines a local gap forward alignment score in quadratic space.
-
PS_LG_MATCH_PRINT prints a local gap alignment.
-
PS_LG_RPL determines a local gap recursive alignment path in linear space.
-
PS_LG_RPL_POP pops the data describing a subproblem off of the stack.
-
PS_LG_RPL_PUSH pushes the data describing a subproblem onto the stack.
-
RMAT_IMAX returns the location of the maximum of a real M by N matrix.
-
RVEC2_SUM_IMAX returns the index of the maximum sum of two real vectors.
-
S_EQI is a case insensitive comparison of two strings for equality.
-
S_TO_CVEC converts a string to a character vector.
-
S_TO_I reads an integer value from a string.
-
SORT_HEAP_EXTERNAL externally sorts a list of items into linear order.
-
WORD_LAST_READ returns the last word from a string.
-
WORD_NEXT_READ "reads" words from a string, one at a time.
Return to the biomedical software page.
Last revised on 09 May 2002.