This website accompanies the course EE 372: Data Science for High-Throughput Sequencing.
For questions/comments/typos in the course notes please leave a comment in the notes, submit a pull request directly to our Git repo, or email us. Click here for last offering's course website.
  • 19 March 2018: All assignment solutions now posted and accessible from the assignment pages.
  • 14 March 2018: Project abstracts posted.
  • 12 March 2018: Assignment 3 deadline extended to Friday 16 March 2018 at 11:59pm.
  • 28 February 2018: Assignment 3 released. Due on Wednesday 14 March 2018 at 11:59pm. Submission through Gradescope.
  • 6 February 2018: Assignment 2 released. Due on Tuesday 20 February 2018 at 11:59pm. Submission through Gradescope.
  • 29 January 2018: Project guidelines handout posted. Please sign up for an office hour slot here.
  • 17 January 2018: David's office hour on Thursday 18 January will be changed to Friday 19 January from 3:00-4:00pm. Govinda and Jesse will be holding office hours from 4:00-5:00pm Friday 19 January on the 3rd floor of Packard (kitchen area).
  • 17 January 2018: Please fill out this Google doc for final project groups.
  • 17 January 2018: Assignment 1 released. Due on Friday 26 January 2018 at 11:59pm. Submission through Gradescope (entry code: M5V4JJ).
  • 9 January 2018: Course Description handout posted.
  • 8 January 2018: Course Outline posted.
Course Description
Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. This course explores the various computational and statistical problems that arises from processing high throughput sequencing data. Specific problems we will study include genome assembly, haplotype phasing, RNA-Seq quantification, single cell RNA-seq analysis, etc. Specific techniques we will learn to solve these problems include spectral algorithms, dynamic programming, the EM algorithm, PCA, FDR, etc. Through this course, the student will also get familiar with various software tools developed for the analysis of real sequencing data.
Course Staff
Instructor: David Tse (dntse _at_
Teaching assistants: Govinda Kamath (gkamath _at_ , Jesse Zhang (jessez _at_
Office hours: Mon 3:00-4:00pm and Thurs 3:15-4:15pm at Packard 264 for instructor, Mon 11:00am-12:00pm at Packard 104 for teaching assistants
Lectures times
Tuesday, Thursday 1:30-2:50pm at 540-108
  • Class participation: 10%
  • Scribing: 10%
  • Problem sets (3-4) : 30%
  • Project: 50%