Doctoral Training in Statistics and Biostatistics: Where Are We Headed?
It has been 26 years since I began my academic career, joining the department of statistics at North Carolina State University as an assistant professor. But it doesn’t seem like it’s been that long (or that I could be this old). Maybe that’s because I’ve had such a good time.
What I have enjoyed most is working with students and thinking about how best to train them, particularly at the PhD level. Over this period, I have developed and taught many courses and been involved in curriculum revision and innovation. And I’ve seen our field evolve from one that, within academia, stressed the mathematical aspects of statistics to one that has become much more application and computation driven.
This evolution has inspired my department and others to review our curricula periodically and introduce new courses, exam structures, and so on. At the PhD level especially, revisions have been mostly incremental. With some exceptions, most departments maintain a PhD curriculum based on a set of “core” courses covering what is considered fundamental, foundational material; one or more qualifying exams evaluating mastery of the core and possibly other material; and elective courses covering “standard” areas such as multivariate analysis and “special topics” based on faculty research interests. Most have some sort of consulting requirement.
This means that students entering a PhD program with a bachelor’s degree take three years of course work in many cases. A growing challenge has been figuring out how to sustain adequate coverage of core and traditional material while introducing students to new topics. Or, for that matter, figuring out if we should.
At no time has this challenge been greater. The genomics revolution and technologies for remote sensing, medical imaging, astronomical observation, and so forth are generating massive new data structures. These Big Data—big news these days, as past-president Bob Rodriguez discussed in his June 2012 column—pose enormous new analytic and computational problems, and, as Bob argues, statisticians must bring their unique understanding of uncertainty and the threats of bias, confounding, and false discovery to the table. The rapid pace of new breakthroughs begs the question of whether our curricula require more than incremental revision.
Our PhDs must graduate with the analytic and computational skills to confront this age of massive data and with foundational mastery of our discipline. They must also have the communication and leadership skills to work in an interdisciplinary setting. I know my department has struggled with this challenge. This led me to wonder how other departments are responding. I decided to find out.
In an admittedly non-rigorous survey, I wrote to 33 chairs/heads of statistics and biostatistics departments and asked them to share their experiences by answering the following questions:
- Has your department undertaken a formal effort to revise your PhD curriculum in light of these developments, and, if so, how?
- Has your department introduced new courses in direct response to Big Data challenges?
- How has your department approached the tension between exposing students to new innovations and coverage of core and traditional material?
- Does your department offer courses or other training experiences in communication and leadership skills?
I received thoughtful responses from 55% (18) of them. I know how busy they are, so I am willing to assume the responses of those who were able are representative of what is taking place. Here is a summary of the salient points.
Four departments reported no recent, systematic effort to modify their curricula or plans to do so. The rest indicated that they had either revised their curricula to varying degrees within the past five years or are in the midst of curriculum evaluation now, inspired by Big Data developments.
Several departments noted that they made a deliberate decision to hire faculty with expertise in genomics, computational biology, and high-dimensional data to develop new courses, direct research, and set the department’s future direction. Many reported moving to engage students in interdisciplinary collaboration in the first year by introducing a major data analysis project or requiring a revamped consulting course involving significant interactions with scientists, presentations, and written reports. Several have reduced the number of exams to discourage students from being distracted by exam preparation. A few departments have replaced traditional exams with exercises requiring students to synthesize a research area and write a journal-style article. Many reported introducing or modernizing and requiring at least one statistical computing course.
Almost all departments have introduced other new courses, many focusing on Big Data topics, with titles mentioning statistical and machine learning, data mining, computational and molecular biology, genomic science, high-dimensional data, and, in a few cases, Big Data explicitly. Several have developed advanced computing courses, including topics such as convex optimization and parallel computing, and require these or are considering making them required. Three mentioned a new course on causal inference. In a few cases, development of such courses was prospective and deliberate, while the vast majority of departments reported they were conceived by individual faculty as “special topics” courses. Three departments would like to offer such courses, but have limited faculty resources.
Views on balancing coverage of the core with that of new topics were diverse. The consensus in several departments is that coverage of core material such as probability, inference, linear models, measure theory, etc., is essential, and no plans for revision were reported. Others have taken steps to streamline. Several departments have merged two-course sequences on measure theory and advanced inference into one course of each to make room for other courses, and some have pared the core down to a first year of probability, inference, and linear models, after which students pursue specialized “tracks.” Still others maintain a full core, but have de-emphasized some classical topics (decision theory was mentioned twice) to make room for modern ones.
A few chairs opined that the current model of several years of coursework may not be viable much longer and that we should consider the biology/computer science approach of less coursework and immersion in research much sooner. Others thought we cannot hope to expose students to everything and should focus on providing a traditional core foundation on which they can build after graduation.
Only a few departments reported having formal courses targeting communication and leadership skills, but most require students to interact with other scientists and give presentations through data analysis projects, lab rotations, and compulsory consulting courses. Two departments require courses on teaching skills, and others have courses on research skills and scientific communication and statistical leadership.
I deliberately restricted this survey to PhD training to keep it focused (and keep this column within the length limit). Several chairs also noted that their departments had undertaken significant curriculum revisions at the master’s level.
The take-away message: There is a lot of thought and innovation taking place regarding the future of PhD curricula. I hope this informal compendium is helpful to departments as they move forward and that it inspires discussion not only in academia, but in industry and government, about the best training models for meeting the Big Data challenge.