Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. This course explores the various computational and data science problems that arises from processing, managing and performing predictive analytics on high throughput sequencing data. Specific problems we will study include genome assembly, haplotype phasing, RNA-Seq assembly, RNA-Seq quantification, single cell RNA-seq analysis, multi-omics analysis, and genome compression. We attack these problems through a combination of tools from information theory, combinatorial algorithms, machine learning and signal processing. Through this course, the student will also get familiar with various software tools developed for the analysis of real sequencing data. The target audience for the course include
students specializing in information theory/algorithms/signal processing/machine learning who want to learn of applications in biology and get exposure to real data
students specializing in computational biology, who want to strengthen their knowledge of basic information theory/signal processing/machine learning
Monday, Wednesday 3:00 PM - 4:20 PM at McCullough 115
Lab hour: Friday (exact time and location TBA)
David Tse (dntse _at_ stanford.edu)
4:20pm-5:05pm MW for instructor at Packard 260
1:45pm-2:45pm M for teaching assistants at Packard 264
Each assignment and set of lecture notes will have its own page, and students are encouraged to ask and answer questions by leaving or replying to comments on these pages. We can also be reached at ee372-spr1516-staff _at_ lists.stanford.edu.
Undergraduate level probability
Some programming experience. We will be using Python.
Some undergraduate background in algorithms would be beneficial
No prior background in biology will be assumed
The grading for the course will be broken down as follows:
Students are encouraged to participate in class either during lecture or by leaving comments on material posted at the course website.
Each student will be responsible for scribing a lecture. To ensure that the notes will be available for students currently in the course, scribed notes are due within 72 hours after lecture (no late submissions accepted). Please reserve lectures using this Google doc. The scribed notes should both give a complete coverage of lecture and be understandable by someone who was not at the lecture. Section off the notes in a way that’s consistent with how Prof. Tse presented the material (with appropriate figures in the right place) and detailed enough to convey all the information. The text does not need to be verbatim as long as it captures the main idea. Scribes will be provided via email with figures used in class. Scribes can also use figures either from the internet (with proper attribution) or figures they make on their own. Do not worry too much about the formatting. Email the text and figures to the course staff and they will take care of the rest. We prefer for the notes to be written in markdown (*.md format). You can find an example file here and some quick tips for writing in markdown here. We encourage scribers to insert relevant links into their notes.
There will be 4 assignments. The assignments will involve a theory component and a programming component. The programming component is aimed at exposing students to the messiness involved in real data and various tools used in practice. The programming assignments will include
experiments demonstrating biases of different types in various types of data,
implementing simple algorithms for assembly, alignment, and quantification,
using popular software packages to perform simple experiments.
Projects can be theoretic or practical in nature (ideally a mix of the two). Additional details and a list of possible projects will be put up shortly. Students can also come up with project topics that they are interested in (in consultation with the teaching staff).