Computational Biology and Bioinformatics

300 George Street, Suite 501, 203.737.6029
M.S., Ph.D.

Directors of Graduate Studies

Mark Gerstein (Bass 432A, 203.432.6105,
Hongyu Zhao (300 George St., Suite 503, 203.785.3613,

Professors Marcus Bosenberg (Dermatology; Pathology), Cynthia Brandt (Emergency Medicine; Anesthesiology), Kei-Hoi Cheung (Emergency Medicine), Ronald Coifman (Mathematics; Computer Science), Stephen Dellaporta (Molecular, Cellular, & Developmental Biology), Richard Flavell (Immunobiology), Joel Gelernter (Genetics; Neuroscience), Mark Gerstein (Biomedical Informatics; Molecular Biophysics & Biochemistry; Computer Science), Antonio Giraldez (Genetics), Murat Gunel (Neurosurgery; Genetics), Jonathon Howard (Molecular Biophysics & Biochemistry; Physics), Amy Justice (Internal Medicine; Public Health), Naftali Kaminski (Internal Medicine), Douglas Kankel (Molecular, Cellular, & Developmental Biology), Yuval Kluger (Pathology), Harlan Krumholz (Internal Medicine; Investigative Medicine; Public Health), Haifan Lin (Cell Biology; Genetics), Shuangge (Steven) Ma (Public Health), Arya Mani (Internal Medicine; Genetics), Pramod Mistry (Internal Medicine; Pediatrics), Ruth Montgomery (Internal Medicine/Rheumatology), Corey O’Hern (Mechanical Engineering & Materials Science; Applied Physics; Physics), Lajos Pusztai (Internal Medicine), Anna Pyle (Molecular Biophysics & Biochemistry), David Stern (Pathology), Jeffrey Townsend (Public Health; Ecology & Evolutionary Biology), Günter Wagner (Ecology & Evolutionary Biology), Hongyu Zhao (Public Health; Genetics), Steven Zucker (Computer Science; Electrical Engineering; Biomedical Engineering)

Associate Professors Murat Acar (Molecular, Cellular, & Developmental Biology), Damon Clark (Molecular, Cellular, & Developmental Biology), Chris Cotsapas (Neurology), Forrest Crawford (Public Health), Thierry Emonet (Molecular, Cellular, & Developmental Biology), Farren Isaacs (Molecular, Cellular, & Developmental Biology), Steven Kleinstein (Pathology), Jun Lu (Genetics), Kathryn Miller-Jensen (Engineering & Applied Science), James Noonan (Genetics), Kevin O’Connor (Neurology), Zuoheng (Anita) Wang (Public Health)

Assistant Professors Julien Berro (Molecular Biophysics & Biochemistry), Monika Jadi (Psychiatry; Neuroscience), Smita Krishnaswamy (Genetics), Monkol Lek (Genetics), Morgan Levine (Pathology), Benjamin Machta (Physics), Edward Melnick (Emergency Medicine), John Murray (Psychiatry; Neuroscience; Physics), Andrew Taylor (Emergency Medicine)

Fields of Study

Computational biology and bioinformatics (CB&B) is a rapidly developing multidisciplinary field. The systematic acquisition of data made possible by genomics and proteomics technologies has created a tremendous gap between available data and their biological interpretation. Given the rate of data generation, it is well recognized that this gap will not be closed with direct individual experimentation. Computational and theoretical approaches to understanding biological systems provide an essential vehicle to help close this gap. These activities include computational modeling of biological processes, computational management of large-scale projects, database development and data mining, algorithm development, and high-performance computing, as well as statistical and mathematical analyses.

Special Admissions Requirements

Applicants are expected (1) to have a strong foundation in the basic sciences, such as biology, chemistry, and mathematics, and (2) to have training in computing/informatics, including significant computer programming experience. The Graduate Record Examination (GRE) General Test is required. Alternatively, the Medical College Admission Test (MCAT) may be substituted for the GRE. Applicants for whom English is not their native language are required to submit results from the Test of English as a Foreign Language (TOEFL).

To enter the Ph.D. program, students apply to an interest-based track within the interdepartmental graduate program in Biological and Biomedical Sciences (BBS),

Integrated Graduate Program in Physical and Engineering Biology (PEB)

Students applying to one of the interest-based tracks of the Biological and Biomedical Sciences program may simultaneously apply to be part of the PEB program. See the description under Non-Degree-Granting Programs, Councils, and Research Institutes for course requirements, and for more information about the benefits of this program and application instructions.

Special Requirements for the Ph.D. Degree

With the help of a faculty advisory committee, each student plans a program that includes courses, seminars, laboratory rotations, and independent reading. Students are expected to gain competence in three core areas: (1) computational biology and bioinformatics, (2) biological sciences, and (3) informatics (including computer science, statistics, and applied mathematics). While the courses taken to satisfy the core areas of competency may vary considerably, all students are required to take the following courses: CB&B 562 or CB&B 750, CB&B 740 or CB&B 561, and CB&B 752. A typical program will include ten courses. Completion of the core curriculum will typically take three to four terms, depending in part on the prior training of the student. With approval of the CB&B director of graduate studies (DGS), students may take one or two undergraduate courses to satisfy areas of minimum expected competency. Students will typically take two to three courses each term and three research rotations (CB&B 711, CB&B 712, CB&B 713) during the first year. After the first year, students will start working in the laboratory of their Ph.D. thesis supervisor. Students must pass a qualifying examination normally given at the end of the second year or the beginning of the third year. There is no language requirement. Students will serve as teaching assistants in two term courses. In addition to all other requirements, students must successfully complete CB&B 601, Fundamentals of Research: Responsible Conduct of Research (or another course that covers the material) prior to the end of their first year of study. In their fourth year of study, all students must successfully complete B&BS 503, RCR Refresher for Senior BBS Students.

M.D./Ph.D. Students

Students pursuing the joint M.D./Ph.D. degrees must satisfy the course requirements listed above for Ph.D. students. With approval of the DGS, some courses taken toward the M.D. degree can be counted toward the ten required courses. Such courses must have a graduate course number, and the student must register for them as graduate courses (in which grades are received). Laboratory rotations are available but not required. One teaching assistantship is required.

Master’s Degree

M.S. (en route to the Ph.D.) To qualify for the awarding of the M.S. degree a student must (1) complete two years (four terms) of study in the Ph.D. program, with ten required courses taken at Yale, (2) complete the required course work for the Ph.D. program with an average grade of High Pass or higher, (3) successfully complete three research rotations, and (4) meet the Graduate School’s Honors requirement.

Terminal Master’s Degree Program The CB&B terminal master’s program has limited availability and is intended primarily for postdoctoral fellows supported by training grants and for students with sponsored funding, e.g., from industry. The curriculum requirements are the same as in the CB&B Ph.D. program, except that there are no requirements for fulfilling laboratory research rotations, serving as a teaching assistant, or completing a Ph.D. dissertation. Terminal M.S. students will be expected to complete an M.S. project, including a project report. Completion of the terminal M.S. degree will typically take four terms of full-time study. Applicants should contact the CB&B registrar before submitting an M.S. application.


Additional courses focused on the biological sciences and on areas of informatics are selected by the student in consultation with CB&B faculty.

CB&B 523b / ENAS 541b / MB&B 523b / PHYS 523b, Biological PhysicsBenjamin Machta

The course has two aims: (1) to introduce students to the physics of biological systems and (2) to introduce students to the basics of scientific computing. The course focuses on studies of a broad range of biophysical phenomena including diffusion, polymer statistics, protein folding, macromolecular crowding, cell motion, and tissue development using computational tools and methods. Intensive tutorials are provided for MATLAB including basic syntax, arrays, for-loops, conditional statements, functions, plotting, and importing and exporting data.
TTh 1pm-2:15pm

CB&B 555a / AMTH 553a / CPSC 553a / GENE 555a, Unsupervised Learning for Big DataSmita Krishnaswamy

This course focuses on machine-learning methods well-suited to tackling problems associated with analyzing high-dimensional, high-throughput noisy data including: manifold learning, graph signal processing, nonlinear dimensionality reduction, clustering, and information theory. Though the class goes over some biomedical applications, such methods can be applied in any field. Prerequisites: knowledge of linear algebra and Python programming.
TTh 11:35am-12:50pm

CB&B 561a, Modeling Biological Systems IThierry Emonet and Kathryn Miller-Jensen

Study of the analytic and computational skills needed to model genetic networks and protein signaling pathways. Review of basic biochemical concepts including chemical reactions, ligand binding to receptors, cooperativity, and Michaelis-Menten enzyme kinetics. Deep exploration of biological systems including: kinetics of RNA and protein synthesis and degradation; transcription activators and repressors; lyosogeny/lysis switch of lambda phage and the roles of cooperativity and feedback; network motifs such as feed-forward networks and how they shape response dynamics; cell signaling, MAP kinase networks, and cell fate decisions; bacterial chemotaxis; and noise in gene expression and phenotypic variability. Students learn to model using MATLAB in a series of in-class hackathons that illustrate biological examples discussed in lectures. Prerequisite: course admission for CB&B students is with permission of the instructor only.
TTh 2:30pm-3:45pm

CB&B 562b / AMTH 765b / ENAS 561b / INP 562b / MB&B 562b / MCDB 562b / PHYS 562b, Modeling Biological Systems IIDamon Clark, Thierry Emonet, and Jonathon Howard

This course covers advanced topics in computational biology. How do cells compute, how do they count and tell time, how do they oscillate and generate spatial patterns? Topics include time-dependent dynamics in regulatory, signal-transduction, and neuronal networks; fluctuations, growth, and form; mechanics of cell shape and motion; spatially heterogeneous processes; diffusion. This year, the course spends roughly half its time on mechanical systems at the cellular and tissue level, and half on models of neurons and neural systems in computational neuroscience. Prerequisite: a 200-level biology course or permission of the instructor.
TTh 2:30pm-3:45pm

CB&B 567b / MB&B 567 / S&DS 567b, Topics in Deep Learning: Methods and Biomedical ApplicationsMark Gerstein

This course provides an introduction to recent developments in deep learning, covering topics ranging from basic backpropagation, to optimization, to the latest developments in deep generative models and network robustness. Applications in natural language processing and computer vision are used as running examples. Several case studies in biomedical applications are covered in detail. Prerequisite: S&DS 565 or permission of the instructor. Enrollment limited.
M 9am-11:15am

CB&B 601b, Fundamentals of Research: Responsible Conduct of ResearchStaff

A weekly seminar presented by faculty trainers on topics relating to proper conduct of research. Required of first-year CB&B students, first-year Immunobiology students, and training grant-funded postdocs. Pass/Fail.

CB&B 634a, Computational Methods for InformaticsRobert McDougal

This course introduces the key computational methods and concepts necessary for taking an informatics project from start to finish: using APIs to query online resources, reading and writing common biomedical data formats, choosing appropriate data structures for storing and manipulating data, implementing computationally efficient and parallelizable algorithms for analyzing data, and developing appropriate visualizations for communicating health information. The FAIR data-sharing guidelines are discussed. Current issues in big health data are discussed, including successful applications as well as privacy and bias concerns. This course has a significant programming component, and familiarity with programming is assumed. Prerequisite: CPSC 223 or equivalent, or permission of the instructor.
W 11am-11:50am, TTh 3pm-4:20pm

CB&B 647b / GENE 645b, Statistical Methods in Human GeneticsHongyu Zhao

Probability modeling and statistical methodology for the analysis of human genetics data are presented. Topics include population genetics, single locus and polygenic inheritance, linkage analysis, quantitative trait analysis, association analysis, haplotype analysis, population structure, whole genome genotyping platforms, copy number variation, pathway analysis, and genetic risk prediction models. Offered every other year. Prerequisites: genetics; BIS 505; S&DS 541 or equivalent; or permission of the instructor.
Th 1pm-2:50pm

CB&B 663b / AMTH 552 / CPSC 663b, Deep Learning Theory and ApplicationsSmita Krishnaswamy

Deep neural networks have gained immense popularity in the past decade due to their outstanding success in many important machine-learning tasks such as image recognition, speech recognition, and natural language processing. This course provides a principled and hands-on approach to deep learning with neural networks. Students master the principles and practices underlying neural networks, including modern methods of deep learning, and apply deep learning methods to real-world problems including image recognition, natural language processing, and biomedical applications. Course work includes homework and a final project—either group or individual, depending on the total number enrolled—with both a written and oral (i.e., presentation) component.

CB&B 711a and CB&B 712b and CB&B 713b, Lab RotationsStaff

Three 2.5–3-month research rotations in faculty laboratories are required during the first year of graduate study. These rotations are arranged by each student with individual faculty members.

CB&B 740a, Clinical and Translational InformaticsRichard Taylor

The course provides an introduction to clinical and translational informatics. Topics include (1) overview of biomedical informatics, (2) design, function, and evaluation of clinical information systems, (3) clinical decision making and practice guidelines, (4) clinical decision support systems, (5) informatics support of clinical research, (6) privacy and confidentiality of clinical data, (7) standards, and (8) topics in translational bioinformatics. Permission of the instructor required.
TTh 10:30am-11:45am

CB&B 750b, Core Topics in Biomedical InformaticsSamah Jarad

The course focuses on providing an introduction to common unifying themes that serve as the foundation for different areas of biomedical informatics, including clinical, neuro-, and genome informatics. The course is designed for students with significant computer experience and course work who plan to build databases and computational tools for use in biomedical research. Emphasis is on understanding basic principles underlying informatics approaches to interoperation among biomedical databases and software tools, standardized biomedical vocabularies and ontologies, biomedical natural language processing, modeling of biological systems, high-performance computation in biomedicine, and other related topics.
TTh 10:30am-11:45am

CB&B 752b / CPSC 752b / MB&B 752b / MCDB 752b, Biomedical Data Science: Mining and ModelingMark Gerstein and Matthew Simon

Biomedical data science encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine-learning approaches to data integration. Prerequisites: biochemistry and calculus, or permission of the instructor.
MW 1pm-2:15pm