This website accompanies the course Data Science for High-Throughput Sequencing (EE 372 at Stanford).
For questions/comments/typos in the course notes please leave a comment in the notes, submit a pull request directly to our Git repo, or email us at ee372-spr1516-staff _at_
  • 2 June 2016: The poster session will take place from 3:30pm-5:30pm on June 6 in the Packard atrium. Pins and easels will be provided.
  • 21 May 2016: Assignment 3 released. Due on 1 June 2016 at midnight. This will be the last assignment.
  • 2 May 2016: Assignment 2 released. Due on 9 May 2016 at midnight.
  • 21 April 2016: Project list and guidelines have been posted. Please access this Google Doc to sign up for a 10-minute slot with the TAs during the 27 April 2016 lecture.
  • 8 April 2016: Tutorials on working in the shell and iPython are posted.
  • 8 April 2016: Assignment 1 released. Due on 15 April 2016 at midnight.
  • 30 March 2016: Additional scribing instructions posted under Course Logistics and Overview.
  • 30 March 2016: Stephen Turner, Co-founder and CTO of PacBio, will be giving a guest lecture on 13 April 2016.
  • 30 March 2016: Bikash Sabata, VP of software at Genia, will be giving a guest lecture on 6 April 2016.
  • 29 March 2016: Please access this Google Doc to sign up for scribing a lecture.
Course Description
Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. This course explores the various computational and data science problems that arises from processing, managing and performing predictive analytics on high throughput sequencing data. Specific problems we will study include genome assembly, haplotype phasing, RNA-Seq assembly, RNA-Seq quantification, single cell RNA-seq analysis, multi-omics analysis, and genome compression. We attack these problems through a combination of tools from information theory, combinatorial algorithms, machine learning and signal processing. Through this course, the student will also get familiar with various software tools developed for the analysis of real sequencing data.
Lectures times
Monday, Wednesday 3:00 PM - 4:20 PM at McCullough 115
Lab hour: Friday (exact time and location TBA)
Course Staff
Instructor: David Tse (dntse _at_
Teaching assistants: Govinda Kamath (gkamath _at_ , Jesse Zhang (jessez _at_
Office hours: 4:20pm-5:05pm MW at Packard 264 for instructor, 1:45pm-2:45pm M at Packard 260 for teaching assistants
Section Materials