Course Overview

Course Description

Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. This course explores the various computational and data science problems that arises from processing, managing and performing predictive analytics on high throughput sequencing data. Specific problems we will study include genome assembly, haplotype phasing, RNA-Seq assembly, RNA-Seq quantification, single cell RNA-seq analysis, multi-omics analysis, and genome compression. We attack these problems through a combination of tools from information theory, combinatorial algorithms, machine learning and signal processing. Through this course, the student will also get familiar with various software tools developed for the analysis of real sequencing data. The target audience for the course include

  • students specializing in information theory/algorithms/signal processing/machine learning who want to learn of applications in biology and get exposure to real data

  • students specializing in computational biology, who want to strengthen their knowledge of basic information theory/signal processing/machine learning

Lecture Times

Monday, Wednesday 3:00 PM - 4:20 PM at McCullough 115
Lab hour: Friday (exact time and location TBA)

Course Staff


David Tse (dntse _at_

Teaching assistants:

Govinda Kamath (gkamath _at_
Jesse Zhang (jessez _at_

Office hours:

4:20pm-5:05pm MW for instructor at Packard 260
1:45pm-2:45pm M for teaching assistants at Packard 264


Each assignment and set of lecture notes will have its own page, and students are encouraged to ask and answer questions by leaving or replying to comments on these pages. We can also be reached at ee372-spr1516-staff _at_


  • Undergraduate level probability

  • Some programming experience. We will be using Python.

  • Some undergraduate background in algorithms would be beneficial

  • No prior background in biology will be assumed

Course Grading

The grading for the course will be broken down as follows:

  • Attendance 10%

  • Scribing 10%

  • Assignments 40%

  • Project 40%


Students are encouraged to participate in class either during lecture or by leaving comments on material posted at the course website.


Each student will be responsible for scribing a lecture. To ensure that the notes will be available for students currently in the course, scribed notes are due within 72 hours after lecture (no late submissions accepted). Please reserve lectures using this Google doc. The scribed notes should both give a complete coverage of lecture and be understandable by someone who was not at the lecture. Section off the notes in a way that’s consistent with how Prof. Tse presented the material (with appropriate figures in the right place) and detailed enough to convey all the information. The text does not need to be verbatim as long as it captures the main idea. Scribes will be provided via email with figures used in class. Scribes can also use figures either from the internet (with proper attribution) or figures they make on their own. Do not worry too much about the formatting. Email the text and figures to the course staff and they will take care of the rest. We prefer for the notes to be written in markdown (*.md format). You can find an example file here and some quick tips for writing in markdown here. We encourage scribers to insert relevant links into their notes.


There will be 4 assignments. The assignments will involve a theory component and a programming component. The programming component is aimed at exposing students to the messiness involved in real data and various tools used in practice. The programming assignments will include

  • experiments demonstrating biases of different types in various types of data,

  • implementing simple algorithms for assembly, alignment, and quantification,

  • using popular software packages to perform simple experiments.


Projects can be theoretic or practical in nature (ideally a mix of the two). Additional details and a list of possible projects will be put up shortly. Students can also come up with project topics that they are interested in (in consultation with the teaching staff).

Course overview as a pdf.