Erik Sandelin
A Monte Carlo Approach to Sequence Assembly

Assembling shotgun sequencing data from repetitive DNA sequences is a non-trivial task. In existing sequence assembly methods repeats are resolved by either using statistical analyses to identify and separate fragments corresponding to repeats, or by using extra information, not contained in the fragments. In this paper we take a different approach. Using the simulated-tempering Monte Carlo method, we resolve repeats by performing an extensive search of the solution space.

The method is tested on two highly repetitive sequences with a two-copy and a three-copy repeat, respectively. We find that the method is able to correctly assemble these two sequences, except for a twofold degeneracy for the three-copy repeat sequence. The alternative solution obtained in this case is related by a simple symmetry to the correct one. The performance of the method is compared with that of simulated annealing. We find that simulated tempering is a competitive alternative to simulated annealing.

LU TP 00-32