Computation on molecular sequence
data (strings) is at the heart of computational molecular
biology. Existing and emerging algorithms for string computation
provide a significant intersection between computer science and
molecular biology. This subject started to flourish on the basis
the following two assumptions.
The first one is that biologically meaningful results
could come by considering DNA as a one-dimensional character
string, abstracting away the reality of DNA as a flexible
three-dimensional molecule, interacting in a dynamic environment
with protein and RNA, and repeating a life-cycle in which even
the classic linear chromosome exists for only a fraction of the
time. The second assumption existed for protein, holding that
all the information needed for correct three-dimensional folding
is contained in the protein sequence itself, essentially
independent of the biological environment the protein lives in.
There are a variety of biologically important problems defined
primarily on sequences, that is, in the computer
science vernacular on strings: reconstructing long strings of
DNA from overlapping string fragments; determining physical and
genetic maps from probe data under various experimental
protocols; storing, retrieving, and comparing DNA strings;
comparing two or more strings for finding similarities; searching
biological databases for homologies; defining and exploring
different notions of string relationships; looking for new or
ill-defined as well as conserved patterns occurring frequently
in DNA; looking for structural patterns in DNA and protein sequences; determining
secondary structure of RNA; and more. There are some distinct statements such as:
The digital information that underlies biochemistry, cell
biology, and development can be represented by a simple string
of A's, C's, G's and T's. and Molecular biology is all about
sequences and it tries to reduce complex biochemical phenomena
to interactions between defined sequences and The ultimate
rationale behind all purposeful structures and behavior of
living things is embodied in the sequence of residues of nascent
polypeptide chains.