SS_GD_ALIGN - Sequence/sequence Global Distance Alignment
SS_GD_ALIGN is a draft implementation of some of the string
matching algorithms described in the reference [Chao]. A global
alignment is attempted between the strings, and similarity is measured by
a sort of distance. These algorithms carry out the computation in
linear space, and compute not just the optimal alignment
score, but the corresponding optimal alignment.
It's important to be able to compute alignments using "linear space",
that is, just a few vectors whose length N is equal to that
of a typical string. A quadratic algorithm would require a two
dimensional array of total dimension N*N. Realistic alignment
problems can involve strings of N=100,000 elements or more,
so a quadratic algorithm would be expensive or impossible to use.
The score for the actual best matching is determined without
constructing the best matching, and in fact it is a matter of some
difficulty to recover the matching, particularly if the algorithm is a
linear space one, which discards a great deal of intermediate information.
However, a linear space algorithm included here can also compute the
optimal matching, based on the idea of a recursive subdivision of the
problem.
Routines that use quadratic space are included as well, so the algorithms
can be compared for storage, speed, and correctness.
-
Reference 1:
-
Kun-Mao Chao, Ross Hardison, Webb Miller,
Recent Developments in Linear-Space Alignment Methods: A Survey,
Journal of Computational Biology,
Volume 1, Number 4, 1994, pages 271-291.
-
Reference 2:
-
Eugene Myers and Webb Miller,
Optimal Alignments in Linear Space,
CABIOS, volume 4, number 1, 1988, pages 11-17.
-
Reference 3:
-
Michael Waterman,
Introduction to Computational Biology,
Chapman and Hall, 1995.
Files you may copy include:
The list of routines includes:
-
A_INDEX sets up a reverse index for the amino acid codes.
-
A_TO_I returns the index of an alphabetic character.
-
C_CAP capitalizes a single character.
-
CVEC2_PRINT prints two vectors of characters.
-
I_SWAP switches two integer values.
-
I_TO_A returns the I-th alphabetic character.
-
IVEC2_COMPARE compares pairs of integers stored in two vectors.
-
IVEC2_PRINT prints a pair of integer vectors, with an optional title.
-
IVEC2_SORT_A ascending sorts a vector of pairs of integers.
-
IVEC_REVERSE reverses the elements of an integer vector.
-
PAM120 returns the PAM 120 substitution matrix.
-
PAM120_SCORE computes a single entry sequence/sequence matching score.
-
PAM200 returns the PAM 200 substitution matrix.
-
PAM200_SCORE computes a single entry sequence/sequence matching score.
-
RVEC2_SUM_IMAX returns the index of the maximum sum of two real vectors.
-
S_EQI is a case insensitive comparison of two strings for equality.
-
S_TO_CVEC converts a string to a character vector.
-
S_TO_I reads an integer value from a string.
-
SIMPLE_SCORE computes a single entry sequence/sequence matching score.
-
SORT_HEAP_EXTERNAL externally sorts a list of items into linear order.
-
SS_GD_BPQ does a global distance backward alignment path in quadratic space.
-
SS_GD_BSL does a global distance backward alignment score in linear space.
-
SS_GD_BSQ does a global distance backward alignment score in quadratic space.
-
SS_GD_FPQ does a global distance forward alignment path in quadratic space.
-
SS_GD_FSL does a global distance forward alignment score in linear space.
-
SS_GD_FSQ does a global distance forward alignment score in quadratic space.
-
SS_GD_MATCH_PRINT prints a global distance alignment.
-
SS_GD_RPL does a global distance recursive alignment path in linear space.
-
WORD_LAST_READ returns the last word from a string.
-
WORD_NEXT_READ "reads" words from a string, one at a time.
Return to the biomedical software page.
Last revised on 13 March 2001.