Statistics and Data Science (S&DS)

S&DS 1000a or b, Introductory StatisticsEthan Meyers

An introduction to statistical reasoning. Topics include numerical and graphical summaries of data, data acquisition and experimental design, probability, hypothesis testing, confidence intervals, correlation, regression, multiple regression, and ANOVA. Application of statistical concepts to data; analysis of real-world problems. A basic introduction to the R programming language. May not be taken after S&DS 1080 or 1090.   QR
HTBA

S&DS 1230a or b / CPSC 1230a or b / PLSC 3508a or b / S&DS 5230a or b, YData: An Introduction to Data ScienceStaff

Computational, programming, and statistical skills are no longer optional in our increasingly data-driven world; these skills are essential for opening doors to manifold research and career opportunities. This course aims to dramatically enhance knowledge and capabilities in fundamental ideas and skills in data science, especially computational and programming skills along with inferential thinking. YData is an introduction to Data Science that emphasizes the development of these skills while providing opportunities for hands-on experience and practice. YData is accessible to students with little or no background in computing, programming, or statistics, but is also engaging for more technically oriented students through extensive use of examples and hands-on data analysis. Python 3, a popular and widely used computing language, is the language used in this course. The computing materials will be hosted on a special purpose web server.  QR
HTBA

* S&DS 1720a / EP&E 328a / EP&E 4328a / PLSC 2509a, YData: Data Science for Political CampaignsJoshua Kalla

Political campaigns have become increasingly data driven. Data science is used to inform where campaigns compete, which messages they use, how they deliver them, and among which voters. In this course, we explore how data science is being used to design winning campaigns. Students gain an understanding of what data is available to campaigns, how campaigns use this data to identify supporters, and the use of experiments in campaigns. This course provides students with an introduction to political campaigns, an introduction to data science tools necessary for studying politics, and opportunities to practice the data science skills presented in S&DS 123, YData.
   QR
W 4pm-5:55pm

* S&DS 1780a / SOCY 3962a, SociogenomicsRamina Sotoudeh

Since the first human genome was sequenced in 2003, social and behavioral data have become increasingly integrated with genetic data. This has proven important not only for medicine and public health but also for social science. In this course, we cover the foundations of sociogenomics research. We begin by surveying core concepts in the field, from heritability to gene-by-environment interactions, and learning the computational tools necessary for producing sociogenomics research. In later weeks, we read some of the latest applied work in the field and discuss the value and limitations of such research. The course culminates in a final project, in which students are tasked with using empirical data to answer a social genetics question of their own.  SO
HTBA

S&DS 2200b, Introductory Statistics, IntensiveRobert Wooster

Introduction to statistical reasoning for students with particular interest in data science and computing. Using the R language, topics include exploratory data analysis, probability, hypothesis testing, confidence intervals, regression, statistical modeling, and simulation. Computing taught and used extensively, as well as application of statistical concepts to analysis of real-world data science problems. MATH 115 is helpful but not required. While no particular prior experience in computing is required, strong motivation to practice and learn computing are desirable.  QR
TTh 9am-10:15am

* S&DS 2240b, Dice, Data, and Decisions - The Statistics of Board Game StrategyRobert Wooster

This course provides a hands-on application of data analysis, simulation, and probability theory to the world of board games and traditional games of chance. Class lessons will be a combination of lecture, computing labs, and actually learning and playing games! Topics include analyzing board game strategy using probability theory, probabilistic modeling using simulation in R, and exploration and analysis of both simulated and real game board game data. One of S&DS 100, 123, 220, or 230, and experience in the R statistical programming language.  QR
TTh 4pm-5:15pm

S&DS 2300a or b, Data Exploration and AnalysisJonathan Reuning-Scherer

Survey of statistical methods: plots, transformations, regression, analysis of variance, clustering, principal components, contingency tables, and time series analysis. The R computing language and Web data sources are used. Prerequisite: a 100-level Statistics course or equivalent, or with permission of instructor.  QR
HTBA

S&DS 2380a, Probability and Bayesian StatisticsRobert Wooster

Fundamental principles and techniques of probabilistic thinking, statistical modeling, and data analysis. Essentials of probability, including conditional probability, random variables, distributions, law of large numbers, central limit theorem, and Markov chains. Statistical inference with emphasis on the Bayesian approach: parameter estimation, likelihood, prior and posterior distributions, Bayesian inference using Markov chain Monte Carlo. Introduction to regression and linear models. Computers are used for calculations, simulations, and analysis of data. After or concurrently with MATH 118 or 120.  QR
TTh 1:05pm-2:20pm

S&DS 2400b, An Introduction to Probability TheoryIlias Zadik

Introduction to probability theory. Topics include probability spaces, random variables, expectations and probabilities, conditional probability, independence, discrete and continuous distributions, central limit theorem, Markov chains, and probabilistic modeling. This course counts towards the Data Science certificate but not the Statistics and Data Science major. Prerequisite: MATH 115.  QR
TTh 11:35am-12:50pm

S&DS 2410a / MATH 2410a, Probability TheorySinho Chewi

Introduction to probability theory. Topics include probability spaces, random variables, expectations and probabilities, conditional probability, independence, discrete and continuous distributions, central limit theorem, Markov chains, and probabilistic modeling. After or concurrently with MATH 120 or equivalent.  QR
MW 9am-10:15am

S&DS 2420b / MATH 2420b, Theory of StatisticsZhou Fan

Study of the principles of statistical analysis. Topics include maximum likelihood, sampling distributions, estimation, confidence intervals, tests of significance, regression, analysis of variance, and the method of least squares. Some statistical computing. After S&DS 241 and concurrently with or after MATH 222 or 225, or equivalents.  QR
MW 2:35pm-3:50pm

S&DS 2650a or b, Introductory Machine LearningLu Lu

This course covers the key ideas and techniques in machine learning without the use of advanced mathematics. Basic methodology and relevant concepts are presented in lectures, including the intuition behind the methods. Assignments give students hands-on experience with the methods on different types of data. Topics include linear regression and classification, tree-based methods, clustering, topic models, word embeddings, recurrent neural networks, dictionary learning and deep learning. Examples come from a variety of sources including political speeches, archives of scientific articles, real estate listings, natural images, and several others. Programming is central to the course, and is based on the Python programming language. Prerequisites: Two of the following courses: S&DS 230, 238, 240, 241 and 242; previous programming experience (e.g., R, Matlab, Python, C++), Python preferred.  QR
HTBA

* S&DS 2800a / NSCI 280 / NSCI 2800a, Neural Data AnalysisEthan Meyers

We discuss data analysis methods that are used in the neuroscience community. Methods include classical descriptive and inferential statistics, point process models, mutual information measures, machine learning (neural decoding) analyses, dimensionality reduction methods, and representational similarity analyses. Each week we read a research paper that uses one of these methods, and we replicate these analyses using the R or Python programming language. Emphasis is on analyzing neural spiking data, although we also discuss other imaging modalities such as magneto/electro-encephalography (EEG/MEG), two-photon imaging, and possibility functional magnetic resonance imaging data (fMRI). Data we analyze includes smaller datasets, such as single neuron recordings from songbird vocal motor system, as well as larger data sets, such as the Allen Brain observatory’s simultaneous recordings from the mouse visual system.  Prerequisite: S&DS 230. Background in neuroscience is recommended but not required (e.g., it would be useful to have taken at the level of NSCI 160).
TTh 2:35pm-3:50pm

S&DS 3120a, Linear ModelsRobert Wooster

The geometry of least squares; distribution theory for normal errors; regression, analysis of variance, and designed experiments; numerical algorithms, with particular reference to the R statistical language. After S&DS 242 and MATH 222 or 225.  QR
MW 2:35pm-3:50pm

S&DS 3350b, Social AlgorithmsJohan Ugander

Algorithms that learn from data play increasingly central roles within modern complex social systems. In this course, we examine the design and behavior of algorithms in such contexts, including search, content recommendation, social recommendation, feed ranking, content moderation, and more. The course has a split focus on the technical design of such algorithms, as well as the literature on theoretical and empirical evaluations, particularly in the presence of strategic behavior, network effects, and algorithmic confounding. Prerequisites: S&DS 123 or S&DS 1230 YData AND (S&DS 230/ S&DS 530 or S&DS 2300/S&DS 5300 Data Exploration OR S&DS 361/S&DS 661 or S&DS 3610/S&DS 6610 Data Analysis), or equivalent.
MW 9am-10:15am

S&DS 3510b / EENG 434 / MATH 2510b, Stochastic ProcessesShuangping Li

Introduction to the study of random processes including linear prediction and Kalman filtering, Poison counting process and renewal processes, Markov chains, branching processes, birth-death processes, Markov random fields, martingales, and random walks. Applications chosen from communications, networking, image reconstruction, Bayesian statistics, finance, probabilistic analysis of algorithms, and genetics and evolution. Prerequisite: S&DS 241 or equivalent.  QR
MW 1:05pm-2:20pm

* S&DS 3520b / MB&B 3520b / MCDB 3520b, Biomedical Data Science, Mining and ModelingStaff

Techniques in data mining and simulation applied to bioinformatics, the computational analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. Sequence alignment, comparative genomics and phylogenetics, biological databases, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, microarray normalization, and machine-learning approaches to data integration. Prerequisites: MB&B 301 and MATH 115, or permission of instructor.  SC0 Course cr
MW 1:05pm-2:20pm

S&DS 3540b, Bayesian Modeling and InferenceXiang Zhou

This course offers a rigorous and comprehensive introduction to Bayesian modeling and inference, encompassing foundational theory, modern computational techniques, and advanced modeling frameworks. The objective of this course is to provide students with a deep and practical understanding of Bayesian modeling and inference. By the end of the course, students are able to explain the Bayesian paradigm, including the roles of prior distributions, likelihoods, and posterior distributions, and compare Bayesian and frequentist approaches. It is designed for advanced undergraduates, master’s students, and PhD students with a strong interest in statistical modeling, inference, and applications across diverse disciplines. Probability theory at the level of S&DS 2410/S&DS 5410, statistical inference at the level of S&DS 2420/S&DS 5420, linear algebra at the level of MATH 2250 or MATH 2260 or equivalent, and experience with computing and data analysis in common programming languages such as R and Python.
TTh 1:05pm-2:20pm

S&DS 3610b / AMTH 3610b, Data AnalysisBrian Macdonald

Selected topics in statistics explored through analysis of data sets using the R statistical computing language. Topics include linear and nonlinear models, maximum likelihood, resampling methods, curve estimation, model selection, classification, and clustering. Extensive use of the R programming language.  Experience with R programming (from e.g. S&DS 106, S&DS 220, S&DS 230, S&DS 242), probability and statistics (e.g. S&DS 106, S&DS 220, S&DS 238, S&DS 241, or concurrently with S&DS 242), linear algebra (e.g. MATH 222, MATH 225, MATH 118), and calculus is required. This course is a prerequisite for S&DS 425 and may not be taken after S&DS 425.  QR
TTh 2:35pm-3:50pm

S&DS 3630b, Multivariate StatisticsJonathan Reuning-Scherer

Introduction to the applied analysis of multivariate data. Topics include principal components analysis, factor analysis, cluster analysis, discriminant analysis, multidimensional scaling, multivariate generalized linear models, and structural equations modeling. Non-parametric analogues of many of these topics are also discussed.  Many of these techniques are used in machine learning. Extensive computer work using R, SAS or SPSS (examples are provided in all programs).  Prerequisites: knowledge of basic inferential procedures and experience with linear models.  QR
TTh 1:05pm-2:20pm

S&DS 3640b / AMTH 3640b / ECE 4541b, Information TheoryYihong Wu

Foundations of information theory in communications, statistical inference, statistical mechanics, probability, and algorithmic complexity. Quantities of information and their properties: entropy, conditional entropy, divergence, redundancy, mutual information, channel capacity. Basic theorems of data compression, data summarization, and channel coding. Applications in statistics and finance. After STAT 241.  QR
TTh 11:35am-12:50pm

S&DS 3650a or b, Intermediate Machine LearningStaff

S&DS 365 is a second course in machine learning at the advanced undergraduate or beginning graduate level. The course assumes familiarity with the basic ideas and techniques in machine learning, for example as covered in S&DS 265. The course treats methods together with mathematical frameworks that provide intuition and justifications for how and when the methods work. Assignments give students hands-on experience with machine learning techniques, to build the skills needed to adapt approaches to new problems. Topics include nonparametric regression and classification, kernel methods, risk bounds, nonparametric Bayesian approaches, graphical models, attention and language models, generative models, sparsity and manifolds, and reinforcement learning. Programming is central to the course, and is based on the Python programming language and Jupyter notebooks. Prerequisites: a background in probability and statistics at the level of S&DS 242; familiarity with the core ideas from linear algebra, for example through Math 222; and computational skills at the level of S&DS 265 or CPSC 200.  QR
HTBA

S&DS 4000a / MATH 3300a, Advanced ProbabilityShuangping Li

Measure theoretic probability, conditioning, laws of large numbers, convergence in distribution, characteristic functions, central limit theorems, martingales. Some knowledge of real analysis assumed.  QR
TTh 2:35pm-3:50pm

S&DS 4100a, Statistical InferenceTheodor Misiakiewicz

A systematic development of the mathematical theory of statistical inference covering methods of estimation, hypothesis testing, and confidence intervals. An introduction to statistical decision theory.  Prerequisite: level of S&DS 241.
TTh 11:35am-12:50pm

* S&DS 4110b, Selected Topics in Statistical Decision TheoryHarrison Zhou

In this course we review some recent developments in statistical decision theory including nonparametric estimation, Bayesian nonparametrics, high dimensional estimation, covariance matrices estimation and Gaussian graphical models, structured matrices estimation and network analysis, analysis of iterative algorithms and overparameterization, and neural nets. Prerequisite: S&DS 410.
M 9:25am-11:20am

* S&DS 4250a or b, Statistical Case StudiesStaff

Statistical analysis of a variety of statistical problems using real data. Emphasis on methods of choosing data, acquiring data, assessing data quality, and the issues posed by extremely large data sets. Extensive computations using R statistical software. Prerequisites: S&DS 361, and prior course work in probability, statistics, and data analysis (e.g. 363, 365, 220, 230, etc., equivalent courses, or equivalent research/internship experience).  Enrollment limited; requires permission of the instructor.   QR
HTBA

S&DS 4320b, Advanced Optimization TechniquesAnna Gilbert

This course covers fundamental theory and algorithms in optimization, emphasizing convex optimization. Topics covered include convex analysis; duality and KKT conditions; subgradient methods; interior point methods; semidefinite programming; distributed methods; stochastic gradient methods; robust optimization; and an introduction to nonconvex optimization.   Applications accepted from statistics & data science, economics, engineering, and the sciences. Prerequisites: Knowledge of linear algebra, such as MATH 222, 225; multivariate calculus, such as MATH 120;  probability, such as S&DS 241/541; optimization, such as S&DS 431/631; and, comfort with proof-based exposition and problem sets.
TTh 1:05pm-2:20pm

* S&DS 4800a or b, Individual StudiesSekhar Tatikonda

Directed individual study for qualified students who wish to investigate an area of statistics not covered in regular courses. A student must be sponsored by a faculty member who sets the requirements and meets regularly with the student. Enrollment requires a written plan of study approved by the faculty adviser and the director of undergraduate studies.
HTBA

S&DS 4910a and S&DS 4920b, Senior ProjectBrian Macdonald

Individual research that fulfills the senior requirement. Requires a faculty adviser and DUS permission. The student must submit a written report about results of the project.
HTBA