Finding Hidden Messages in DNA (Bioinformatics I)
Details
About the Course
A genome may look like an incomprehensible string of the letters A, C, G, and T. Yet hidden in the three billion nucleotides of your genome is a secret language. This course offers an introduction to how we can start to understand this language by using algorithms to find hidden messages in DNA.
What do these hidden messages say? In the first chapter of the course, hidden DNA messages indicate where a bacterium starts replicating its genome, a problem with applications in bioengineering and beyond. In the second chapter, hidden DNA messages tell us how organisms know whether it is day or night as well as how the bacterium causing tuberculosis is able to hide from antibiotics. We will see howrandomized algorithms, which toss coins and roll dice, can be used to find these messages.
Each of the two central topics in the course builds the algorithmic knowledge required to address this challenge. Along the way, coding challenges and exercises, many of which ask you to apply your skills to real genetic data, will be directly integrated into the text at the exact moment they are needed.Outline
Where in the Genome Does Replication Begin? (Algorithmic Warmup):
- Introduction to DNA replication
- Hidden messages in the replication origin
- Some hidden messages are more surprising than others
- An explosion of hidden messages
- The simplest way to replicate DNA
- Asymmetry of replication
- Peculiar statistics of the forward and reverse half-strands
- Some hidden messages are more elusive than others
- A final attempt at finding DnaA boxes in E. coli
- Epilogue: Complications in oriC predictions
- Do we have a "clock" gene?
- Motif finding is more difficult than you think
- Scoring motifs
- From motif finding to finding a median string
- Greedy motif search
- Motif finding meets Oliver Cromwell
- Randomized motif search
- How can a randomized algorithm perform so well?
- Gibbs sampling
- Gibbs sampling in action
- Complications in motif finding
- Epilogue: How does Tuberculosis hibernate to hide from antibiotics?
Speaker/s
Professor
Department of Computer Science and Engineering
University of California, San Diego
Pavel Pevzner (http://cseweb.ucsd.edu/~ppevzner/) is Professor of Computer Science and Engineering at University of California San Diego (UCSD), where he holds the Ronald R. Taylor Chair and has taught a Bioinformatics Algorithms course for the last
12 years. In 2006, he was named a Howard Hughes Medical Institute Professor. In 2011, he founded the Algorithmic Biology Laboratory in St. Petersburg, Russia, which develops online bioinformatics platform Rosalind (http://rosalind.info). His research concerns
the creation of bioinformatics algorithms for analyzing genome rearrangements, DNA sequencing, and computational proteomics. He authored Computational Molecular Biology (The MIT Press, 2000), co-authored (jointly with Neil Jones) An Introduction to Bioinformatics
Algorithms (The MIT Press, 2004), and co-edited (with Ron Shamir) Bioinformatics for Biologists (Cambridge University Press, 2011). For his research, he has been named a Fellow of both the Association for Computing Machinery (ACM) and the International Society
for Computational Biology (ISCB).
Phillip E. C. Compeau
Computer Science & Engineering, UC San Diego
University of California, San Diego