Computing Science is in the middle of a major paradigm shift, driven by Molecular Biology. Adleman by his breath-taking paper announced the arrival of computers based on biochemical operations and has showed that a
large class of difficult and computationally hard problems is best solved not by pushing electrons through wires in a computing laboratory, but by mixing
solutions in test tubes in a molecular biology laboratory. As the computationally hard problems are the stumbling blocks for the contemporary Von Neumann computers, the DNA based computation is poised to play a greater role in computing. This article discussed about this novel idea of DNA based computation.
1. Introduction
Today's computers are millions of times more powerful than their crude
ancestors in the 40's and 50's. Almost every two years, computers have become
twice as fast whereas their components have assumed only half the space and
however, it has also been realized that integrated circuit-technology is
running against its limits and it has been a hotly debated question whether
computers of an entirely new kind, quantum-mechanical computers or
computers based on Molecular Biology is in the offing.
One of the recently introduced unconventional paradigms, which promises to
have a tremendous influence on the theoretical and practical progress of
computer science is DNA computing, which under some circumstances might be an
elegant alternative to the classical Turing/Von Neumann notion of computing.
It is sure that the union of two of science's most fertile
fields, molecular biology and computer science is to produce some remarkable offsprings.
In 1994, Adleman invented a method for solving a small
instance of a Directed Hamiltonian Path (DHP) Problem by an
in vitro DNA-recombination assay which he performed
experimentally using hybridization, several agarose-gel
separations, and PCR by handling DNA sequences in a test
tube. Before discussing about this experiment, here is an
overview about DNA molecules, which make the way for this
sort of innovative computing model.
2. The Structure and manipulation of DNA
DNA (deoxyribonucleic acid) encodes the genetic
information of cellular organisms. It consists of
polymer chains, commonly referred to as DNA
strands. Each strand may be viewed as a chain of nucleotides, or bases. An n-letter sequence of consecutive bases is known as an n-mer or an oligonucleotide of length n. The four DNA nucleotides are adenine, guanine, cytosine and thymine, commonly abbreviated to A,G,C and T respectively.
Each strand has, according to chemical convention, a 5' and a 3'
end, thus any single strand has a natural orientation. The
classical double helix of DNA is formed when two separate
strands bond together. Bonding occurs by the pairwise attraction of
bases; A bonds with T and G bonds with C. The pairs (A,T) and
(G,C) are therefore known as Watson-Crick complementary base pairs.
Thus a hypothetical DNA molecule sequence is
3. Operations on DNA sequences
The following operations can be done on DNA sequences in a test
tube to program the DNA computer
- Synthesis: synthesis of a desired strand
- Separation: separation of strands by length
- Merging: pour two test tubes into one to perform union
- Extraction: extract those strands containing a given pattern
- Melting/Annealing: break/bond two single strand DNA molecules with complementary sequences.
- Amplification: use PCR to make copies of DNA strands
- Cutting: cut DNA with restriction enzymes
- Ligation: Ligate DNA strands with complementary sticky ends using ligase.
- Detection: Confirm presence/absence of DNA in a given test tube.
The Hamiltonian Path Problem(HPP) may be phrased as follows: given a set of n cities connected by one-way and two-way roads, does there exist a path through this network starting at the first city and ending at the last city such that each city is visited once and only once?.
The problem is deceptively easy to describe, but in fact belongs to the notoriously intractable class of NP-complete problems, which signifies the class of problems solvable in Nondeterministic
Polynomial(NP) time. Typically, these problems involve a search where at each point in the search there is an exponential increase in the number of possibilities to be searched through, but where each possibility can be searched through polynomial time.
Consider a map with five cities linked by one-way and two-way roads. Adleman's approach was to encode each city and each route between two cities in DNA strands, put into a test tube.
For example, the strand coding for cities 1 and 2 could
be AATGCCGG, TTTAAGCC respectively. A road from city 1
to 2 is encoded in such a way that the first part is the
complementary strand to the second half of strand for
city 1, and the second part is the complementary strand
to the first half of the strand for city 2, i.e. GGCCAAAT. That is, GGCC is the complementary version
of the last four bases of city 1, and AAAT is the complementary version
of the first four bases of city 2. Thus the edge
joining the cities 1 and 2 is being encoded as follows.
GGCCAAAT
AATGCCGGTTTAAGCC
Similarly the DNA molecules strands can
be formed for all the nodes and edges representing all
possible routes in the directed
graph in the test tube. The first stage of Adleman's computation was to extract those long strands which start with city 1 and store these in a separate test tube. The second stage was to extract those strands which corresponded to a certain length which signified exactly 5 cities being passed through. If each city is represented by 8 DNA bases, all strands of 40 bases would be extracted and stored in a separate test tube.
The third stage is to extract all those strands containing the DNA sequence for city 1, then those containing the DNA sequence for city 2, and so on. If there is a solution to this route problem, it will be found in the strands extracted for the last city 5.
4. The case for DNA computing
The possible advantages of DNA-based computer architecture became immediately apparent:
Computing with DNA offers the advantage of massive degrees of miniaturization and parallelism over conventional silicon-based machines. For example, a square centimeter of silicon can currently support around a million transistors, whereas current manipulation techniques can handle to the order of 1020 strands of DNA.
- Size: the information density could go up to 1 bit/nm3.
- High parallelism: every molecule could act as a small processor on nano-scale and the number of such processors per volume would be potentially enormous. In an in vitro assay we could handle easily with about 1018 processors working in parallel.
- Speed: although the elementary
operations(electrophoretic separation, ligation,
PCR-amplifications) would be slow compared to electronic
computers, their parallelism would strongly prevail, so that in
certain models the number of operations per second could be of
order 1018 operations per second, which is at
least 100,000 times faster than the fastest supercomputers existing today.
- Energy efficiency: 1019 operations per Joule. This is about a billion times more energy efficient than today's electronic devices.
The research in this field had both experimental and theoretical aspects. The experiments that have actually been carried out are not numerous so far. P.Kaplan replicated Adleman's experiment, a Wisconsin team of computer scientists and biochemists made a partial progress in solving a 5-variable instance of SAT problem, an another NP-complete problem, by using a surface-based approach. F.Guarnieri and his team have used a horizontal chain-reaction for DNA-based addition.
Also, various aspects of the implementability of DNA
computing have been experimentally investigated: the
effect of good encodings on the solutions of Adleman's
problem were addressed; the complications raised by
using the Polymerase Chain Reaction (PCR) were studied; the usability of self-assembly of DNA was studied; the experimental gap between the design and assembly of unusual DNA structures was pointed out; joining and rotating data with molecules was reported; concatenation with PCR was studied; evaluating simple Boolean formulas was started; ligation experiments in computing with DNA were conducted.
The theoretical work on DNA computing consists, on
one side, of designing potential experiments for solving
various problems by means of DNA
manipulation. Descriptions of such experiments include
the Satisfiability Problem, breaking the Data Encryption
Standard (DES), expansions of symbolic determinants, matrix multiplication, graph connectivity and knapsack problem using dynamic programming, road coloring problem, exascale computer algebra problems, and simple Horn clause computation.
On the other side, the theoretical research on DNA computing comprises attempts to model the process in general, and to give it a mathematical foundation. To this aim, models of DNA computing have been proposed and studied from the point of view of their computational power and their in-vitro feasibility. Some of them are given below.
5. Models and Formats of DNA Computation
In the two years that followed, a lot of theoretical work has been done on generalizing Adleman's approach in order to define a general-purpose DNA-based molecular computer that could also be implemented by an in vitro system.
Lipton generalized Adleman's model and showed how his model can encompass solutions to other NP-complete problems.
The other model is by splicing operation proposed by Head and
vigrously followed by many researchers using formal language theory. It is shown that the generative power of finite extended splicing systems is equal to that of Turing Machines.
Afterwords, Paun and others introduced the so-called sticker model. Unlike previous models, the sticker mode has a memory that can both read and written to, and employs reusable DNA.
Also there is a proposal about the tendency of DNA structures to self-assemble as a computational tool. They show that the self-assembly of complex branches known as double cross-overs into two-dimensional sheets or three-dimensional solids is a computationally powerful model.
However, there are some impediments to effective computation by these models.
It is a common feature of all the proposed implementations that the biological operations to be used are assumed to be error-free. An operation central to and frequently employed in most models is the extraction of DNA strands containing a certain sequence (known as removal by DNA hybridization). The most important problem with this method is that extraction is not 100% efficient and may at times inadvertently remove strands that do not contain the specified sequence. Especially for a large problem, the number of extractions required may run into hundreds, or even thousands resulting a high probability of incorrect hybridization.
Thus, a novel error-resistant model of DNA computation has been proposed by Alan Gibbons and his team that obviates the need for hybridization extraction within the main body of the computation.
Like previous models, this model is particularly effective for algorithmic description. It is sufficiently strong to solve any of the problems in the class NC and the authors have given DNA algorithms for 3-vertex-colorability problem, Permutations Problem, Hamiltonian Path Problem, the Subgraph isomorphism problem, and the Maximum clique and maximum independent set problem.
There are two general formats in which complex combinatorial sets of DNA molecules may be manipulated.
- in solution ( solution-phase format)
- attached to a surface (solid-phase format)
The solid-phase format possesses many important advantages over the solution-phase format.
- Facilitated sample handling.
With the DNA molecules attached to a support, the experimental manipulations are very simple. They are addition of a solution to the support and removal (washing) to a solution from the support. These steps are readily automated.
- Decreased losses during sample handling
- Reduction of interference between oligonucleotides
- Solid-phase chemistry permits facile purification of the DNA molecules at every step of the process.
6. Pitfalls of DNA Computing
The idea of using DNA to solve computational problems is
certainly intriguing and elegant, and DNA does provide a massive
parallelism far beyond what is available on existing
silicon-based computers. However, there are many technological
hurdles to overcome. We give below one of the huge fundamental
problem to be solved to attain the goal of designing universally
programmable molecular computer.
The fundamental problem is that,
the function of 2n is exponential whether it counts time or
molecules. It has been estimated that Adleman's Hamiltonian path
problem, if enhanced to 50 or 100 cities, would require tons of
DNA. The minimum amount of required DNA for Lipton's SAT method
needs a few grams of DNA molecules for 70 variables. If this is
increased to 100 variables, the minimum DNA requirement of
millions of kilograms.
Thus raw idea of brute-force enumeration is not going to work
beyond modest problem sizes. Thus it is imperative to bring
forth new revolutionary ideas to make this notion of DNA-based
computing to work realistically. Only time and investment will
tell where the initial ideas for DNA computing from those
experts will lead. Many
enhancive ideas have been published but all of them suffer
under this fundamental problem. Hopefully the future molecular
computation methods may bring forth new revolutionary ideas to
overcome this very fundamental as well as significant hurdle
7. The future of DNA Computing
The significance of this research is two-fold: it is the first demonstrable use
of DNA molecules for representing information, and also the first attempt to deal with an
NP-complete problem. But still much more work needs to be done
to develop error-resistant and scalable laboratory
computations. Designing experiments that are likely to be
successful in the laboratory and algorithms that proceed through
polynomial-sized volumes of DNA is the need of the hour. It is
unlikely that DNA computers will be used for tasks like word
processing, but they may ultimately find a niche market for
solving large-scale intractable combinatorial problems. The goal
of automating, miniaturizing and integrating them into a
general-purpose desktop DNA computer may take much longer time.