Maintaining Quality in the Face of Rapid Program Expansion
Rebecca Nugent, Carnegie Mellon University
My, how times have changed. Many of today’s senior faculty and leading industry statisticians probably didn’t take nonintroductory statistics courses until graduate school. Now, incoming freshmen have not only placed out of introductory statistics and calculus, but they’ve already selected statistics as their major, applied for their first data analytics summer internship, and declared their undying allegiance to big data. What on Earth do we do with this new generation?
In the department of statistics at Carnegie Mellon University, we’ve been asking ourselves this very question. CMU has an undergraduate enrollment of nearly 6,000 students; recently, the department of statistics has been teaching about 1,000 undergraduates per semester from across the campus. Even with our high service demand, one pressing problem is the rapid expansion of the undergraduate statistics program. In the last five years, the number of statistics and economics-statistics majors has quadrupled to around 150 students. Given the department’s focus on groundbreaking research and strong commitment to high-quality vertically integrated education, we are faced with redesigning our curriculum to satisfy demand while maintaining our high standard of statistical training for both industry and graduate school. Not an easy task.
Our program combines a solid theoretical background with thorough exposure to methodology, both traditional and modern. After introductory courses and mathematics prerequisites, majors take a year sequence in probability and statistical inference. Doing well prepares them for the subsequent methodology sequence—a thorough treatment of linear regression followed by a semester of advanced data analysis methods including bootstrap simulation, kernel smoothing, splines, generalized linear and additive models, causality, and Markov models. The enrollment in this sequence has increased by 400% over the past five years, not solely due to the majors. As other programs such as mathematical sciences, business administration, and computer science embrace the importance of understanding modern statistical methods, more and more students are showing up in our classrooms.
Lower-level electives in statistical graphics and visualization, sampling and survey methods, experimental design, and statistical computing help students gain programming skills and “data sense” before the advanced methodology classes. Advanced electives are offered in stochastic processes and “special topics.” The special topics courses are offered every semester with rotating topics that include statistical methods in epidemiology, statistical learning, multivariate analysis, multilevel and hierarchical models, and data mining. Given our flexible—but stable—rotation design, most of our majors take at least two or three special topics courses. Weekly seminars are occasionally offered on topics such as statistics in sports or statistics and the law. Qualified students also are allowed to take first-year graduate courses, most commonly statistical inference or methodological courses like parallel computing or machine learning.
Carnegie Mellon University is fully committed to undergraduate research; the statistics program is no exception. In response to the overwhelming growth of our capstone methodology courses, we created an invitation-only research course that pairs small groups of students with faculty clients across campus (and occasionally industry) for a semester-long research project. Accepted students learn the basic principles of research, including literature reviews, methodology comparisons and critiques, and presentation of results. Each team presents their final project in an oral defense, a poster session, and a written report.
While enrollment tends to favor the more qualified seniors, we encourage promising junior statistics majors to apply, knowing that participation in this course will prepare them for a senior honors thesis the following year. Our thesis students meet as a group during the year to give updates on their research and receive feedback from faculty and peers. We have found that, rather than isolating the student and adviser, the group dynamic has increased the quality of the completed theses.
Research projects are also available to other interested students. The department has a long history of supporting undergraduate research internally and with outside grant support. Each semester, there are four or five students working on National Science Foundation (NSF)–sponsored research projects, typically as members of research groups that include faculty and graduate students. Our summer programs have included participation in Morehouse College’s Project IMHOTEP, a program designed to increase minority participation in statistics (also a feature of our current NSF Research Training Group (RTG) grant), and this year’s group of 13 undergraduates sponsored by both the RTG and our NSF Census Research Node grant. If students do not qualify for NSF funding (due to citizenship), they are offered course credit or department-funded stipends. Independent studies are also available. In all cases, students are expected to present their research to the department and at the annual Carnegie Mellon undergraduate research symposium. This year, we had around 50 students competing in the statistics research competition—a record number.
The department also has funds earmarked for undergraduate support in the form of two endowments, the DeGroot Memorial and Frederick Sorensen Memorial in Statistics funds. These contribute to awards and travel subsidies for undergraduates to attend and/or present their work at conferences.
One of our primary challenges has been the simultaneous preparation of students for both industry and graduate school. The general CMU population skews toward the quantitative analysis job market and, as such, needs to be well versed in methods, but also oral and written communication. These skills are, of course, part of a successful graduate school experience, as well. Every upper-level course in our department includes writing and presentation skills. For example, in linear regression, students are required to analyze large data sets from real, interdisciplinary research problems and present their findings in scientific reports. These projects and reports build in complexity over the semester, culminating in something similar to a master’s-level qualifying exam experience. All students receive individual report feedback (including grammar and spelling). Students are required to regularly defend their work to faculty and other students in poster sessions, class presentations, or one-on-one conversations. Students also are taught to critically analyze their work and suggest possible improvements without prompting.
Almost every assignment, exam, or project is grounded in a real, ongoing research problem. In upper-level courses, all data sets are large, messy, and complicated. It is not uncommon for sophomores to be working with thousands of observations with missing values. Students are presented with background material on a research problem and then asked to address specific scientific questions. They quickly become accustomed to the need to integrate the statistical analysis and the research problem at hand.
Given the large number of students in both the program and our classes, building a sense of community has never been more important. We pride ourselves on providing personal feedback and a welcoming, nurturing environment. However, as numbers grow, the feasibility of this approach fades. Students who struggle often feel isolated in large classes; students who chafe under the necessarily stricter large class framework feel stifled. Oddly enough, we have found that fostering friendly competition has been enormously helpful in building a community of creative, motivated undergraduate statisticians. For example, students might be given a set of training data and a week to design an algorithm that optimizes a given error criterion. Students can work in teams or on their own; they then present their approach and test their algorithms live in class on a set of test data. The winners might receive extra credit, but also bragging rights. We have held these types of competitions in several classes and they have yet to disappoint.
One surprising result is that the winners are not always the top students in class. Some struggling students will completely devote themselves to the competition, learning additional advanced material on their own. The day of the competition is easily the most energetic day of the semester; the classroom is filled with cheers, groans, high-fives, and even dancing. Even in our statistical inference class, our version of “Theory Jeopardy” had more than 80 students in a large auditorium frantically trying to solve inference problems in teams while judges checked their work. When faced with the “Daily Doubles,” the cheers from the teams able to double their scores were deafening. Who knew statistics could be so cool?
We think the best way to manage our growing program is to build a community in which the statistics majors feel they are more than a name on a file. We advocate group collaboration and engagement in departmental activities. Our seniors lead by example, both in the classroom and in the larger CMU community. Anecdotally, we hear stories of upper-class statistics majors “adopting” newer majors in campus social organizations. The majors also have a student advisory committee to represent the departmental undergraduate interests; this committee sponsors both professional development activities (e.g., interview preparation seminars) and social events (e.g., a human histogram to celebrate World Statistics Day). This year’s CMU undergraduate statistics sweatshirt says “Statistics majors are always right … probably.” Sounds about right to us.