## New Undergraduate Data Science Programs

The number of undergraduate statistics degrees has nearly doubled in the last four years—making it the fastest growing STEM degree—and master’s degrees are also growing quickly. Further, the number of universities granting undergrad statistics degrees has increased from the 74 in 2003 to more than 110 last year.

In the April issue of *Amstat News*, we profiled five of the largest and fastest-growing undergraduate U.S. statistics programs. This month, we look at new undergraduate data science programs.

## Northern Kentucky University

**Bachelor of Science in Data Science**

**Number of students currently enrolled:** 23

**First students expected to graduate:** Spring 2017

**Partnering departments:** Computer Science, Mathematics and Statistics, and Business Informatics

Mark J. Lancasteris an assistant professor of statistics and data science at Northern Kentucky University. His research interests include the application of statistics to forensic science, computer vision, and pattern recognition. He has membership in the ASA and ACM.

James W. McGuffeeis associate professor and chair of the department of computer science in the college of informatics at Northern Kentucky University. His research interests include networks and computing for the social good. He is a senior member of the ACM.

### How do you view the relationship between statistics and data science?

Statistics is an extremely valuable (and possibly undervalued) component to data science. Statisticians have been aware for quite some time that it is not necessary to have or use every scrap of data to make good decisions in the presence of randomness. Data science is an art of balancing the desire to use the maximum amount of information against the computational resources needed to work on that information, and statistical methodology can usually reduce both sides to manageable levels.

Statistics is an integral component of the data science program at Northern Kentucky University (NKU). We approach data science as a transdisciplinary challenge that requires the cooperation from computer science, statistics, and business informatics. The challenge of Big Data goes beyond the scope of one academic area and requires the full engagement of various areas.

### What are the basic elements of your data science curriculum, and how was it developed?

Our expectation is that the graduates of the data science program at Northern Kentucky University will be able to understand the mathematical and statistical foundations of data science and understand the business context in which data science functions. We also expect students to be able to implement algorithms for data aggregation, cleaning, and analysis; select and apply appropriate data analysis techniques to a variety of tasks; and communicate data analysis findings with appropriate visualizations.

### What was your primary motivation(s) for developing an undergraduate data science program? What’s been the reaction from students so far?

The origin of NKU’s data science program was in a suggestion in 2011 from the NKU administration that the university consider developing a program in systems and information engineering, taking some ideas from the curriculum at the University of Virginia. The NKU College of Informatics leadership brought together a team of faculty from computer science, mathematics, statistics, business informatics, and physics. Inspired by a growing interest in Big Data computing, the group shifted the emphasis of the program and eventually its name to “Data Science,” sending it out for approval at the state level in October 2012. A key motivation was to take advantage of the undergraduate context, which favors broad integration with general education rather than narrow specialization, together with the informatics context, in which we pay special attention to the communication of information (How do you tell a story with a billion pieces of data?).

Response has been enthusiastic, both by students, with an enrollment of 23 by the end of the second year since launch, and by area employers, who are eager to hire students from this rare undergraduate program in data science.

From the faculty perspective, joining a program in its infancy at NKU has provided me with an incredible opportunity to create and develop new courses at the start of my academic career. Additionally, the ASA has published several articles indicating the need for statisticians to engage with data scientists, so I view this as an important service to the statistical profession. The reaction from students has been positive, with student enrollment into the degree program increasing every year.

### Describe the reception you received from the partnering departments, other departments, and those at the university who had to approve the program.

AT NKU, the data science degree is creating something new and original. There have been a few times when someone has asked, “Well, how do we do that?” To everyone’s credit, we have always found a way to make it happen and everyone has been supportive and creative in finding workable solutions.

### What advice do you have for students considering a data science degree versus a computer science degree, statistics degree, another degree, or some combination of the above (e.g., a double major of statistics and computer science)?

If a student is majoring in data science, we recommend courses in communication studies such as public speaking, small group communication, and/or cross-cultural communication. The reasoning is that most likely a data scientist will be working as part of a team on a project, and being comfortable in communicating statistical and computer science methodology will be invaluable in business and consulting settings.

As for students trying to decide whether to major in data science, every student should choose a degree that best fits his/her needs. For those students who welcome a challenge, are excited about the possibilities of Big Data, and want a truly novel approach to studying data science, then NKU is the place to be. To properly address the challenges in Big Data takes knowledge and experience in more than one area, and that’s what an integrated, comprehensive undergraduate degree in data science can do for you.

## University of California, Irvine

## Bachelor of Science in Data Science

**Year in which first students expected to graduate: ** 2019

** Number of students currently enrolled:** None (yet!). The major starts in fall 2015.

** Partnering departments: **Statistics (lead) and Computer Science

Stacey Hancockis an assistant teaching professor in the department of statistics at the University of California, Irvine. Her research is primarily in statistics education, with additional interests in time series analysis and environmental statistics.

Padhraic Smythis a professor in the department of computer science with a joint appointment in the department of statistics at the University of California, Irvine. He is also director of the UCI Data Science Initiative. His research interests are primarily at the intersection of machine learning and statistics.

### How do you view the relationship between statistics and data science?

**Stacey:** One of my favorite definitions of a data scientist is a statistician who understands the principles of computing. As a statistician, my view of the relationship between statistics and data science is constantly changing, ranging from the pessimistic outlook that statisticians are being left behind in this new wave of data science to the optimistic approach of, aren’t we all data scientists? A quick Google search of “statistics” yields the following definition: “Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data,” which sounds like data science! A Google search of “data science” does not produce such an easily found definition. That being said, I do feel statisticians need more training in data management and organization, data visualization, predictive modeling/machine learning, and communicating results, areas that are all emphasized in data science. In fact, the recently revised *ASA Curriculum Guidelines for Undergraduate Programs in Statistical Science* recommend these data science skills as an important part of undergraduate programs in statistics. Other data scientists would benefit from learning how to think like a statistician: When can you generalize your results to a larger population? What assumptions are we making when using statistical methods? How can we inherently understand and quantify variability and uncertainty? As the field of data science emerges and evolves, the best data scientists will be those of us with solid foundations in both statistical thinking and computational skills.

### What are the basic elements of your data science curriculum, and how was it developed?

**Stacey:** The idea for a major in data science came about very recently, ignited by the UCI Data Science Initiative. The Data Science Initiative aims to bring together researchers and students from all areas of campus involved in data science, broadly defined. An undergraduate major in data science, merging the core disciplines of computer science and statistics, was at the heart of the undergraduate plans for the Data Science Initiative. UCI is somewhat uniquely positioned to become a leader in the training and mentoring of future data scientists because the departments of computer science and statistics are both housed in the Donald Bren School of Information and Computer Sciences and already work closely in both research and training graduate students.

Our BS degree in data science curriculum includes core statistics and computer science courses, including one-year sequences in mathematical statistics and in statistical methods for data analysis, Bayesian statistics, statistical computing and exploratory data analysis, data management, machine learning and data mining, and information visualization. There are four newly developed courses unique to the major. Current researchers and practitioners of data science introduce students to topics at the forefront of the discipline in a first-year seminar in data science, both in terms of foundational methodologies (e.g., in statistics, machine learning, databases) and domain-specific applications (e.g., in climate science, astronomy, social science). We have developed a new course in statistical computing and exploratory data analysis that majors take in their second year. In their senior year, students take a two-quarter team-based project course, which provides a capstone experience to the major. The project course solidifies the connections between statistics and computer science, develops written and oral communication and presentation skills, and exposes students to the types of problems they will encounter as a practicing data scientist.

### What was your primary motivation(s) for developing an undergraduate data science program? What’s been the reaction from students so far?

**Padhraic: **We and many of our colleagues have seen a rapid increase in demand for “statisticians that know computing” and “computer scientists that know statistics” in industry, in academic research, in government, and so on. Our primary motivation in proposing the major is to try to help meet this need—and rather than “retraining” students (e.g., in a data science master’s program) after they get their primary degree, we believe there are significant benefits to training students to think as data scientists from the beginning of their undergraduate education. The goal is not to produce students who will compete directly with computer science majors or with statistics majors: We are confident data science majors will successfully occupy their own niche between the two existing disciplines and there will be significant continued demand for students with skills that lie at the intersection for many years to come.

We conducted some informal surveys in large undergraduate statistics classes and computer science classes while writing the proposal for our major and consistently found subsets of students who were enthusiastic about the idea (and who wanted to sign up straight away!). In addition, we have also recently been getting inquiries from incoming freshmen (and their parents!) about when the major will start—perhaps a reflection of the fact that the term “data science” has seen considerable exposure in the public media in the past year.

### Describe the reception you received from the partnering departments, other departments, and those at the university who had to approve the program.

**Padhraic: **The reception to date has been very positive. As mentioned earlier, we are very fortunate at UC Irvine to have the department of computer science and the department of statistics in the same school (and indeed in the same building). The recently founded UCI Data Science Initiative has also played an important role in getting the new major established. The initiative is cross campus in scope, but with “roots” in our departments, making it relatively easy for us to build bridges to other departments to get support and input for the major. Yet another helpful factor has been the fact that our school dean is a statistician (Hal Stern), whose own research as a Bayesian statistician relies heavily on computational methods. All of these factors have made planning and organization of the major much easier than it would have been if our departments were located in different schools across campus and if we did not have higher-level support from the dean’s office and Data Science Initiative. This has allowed us to present a unified voice to the rest of campus in terms of our vision for the program—and with the statistics and computer science departments leading the way, the other departments and schools on campus have been happy to support the new major.

### What advice do you have for students considering a data science degree versus a computer science degree, statistics degree, another degree, or some combination of the above (e.g., a double major of statistics and computer science)?

**Stacey and Padhraic:** If you love exploring data, manipulating data, visualizing data, and learning from data, then a data science degree is for you. Data science has emerged as a field at the intersection of statistics and computer science, and there is a high demand for individuals trained in this area—a demand that will only increase. Ultimately, we would like to be able to train students who are comfortable in doing something like the following: designing and implementing a data analysis project in which they have to write code to gather large amounts of data from the web, processing and storing this data in a database, visualizing and exploring the data, fitting a statistical model to the data in collaboration with a domain expert, and then communicating the results.

In terms of other degrees compared to a degree in data science, a statistics degree will likely have more mathematics, more theoretical training in statistical methods, and will better prepare you for graduate study in statistics. Compared to a degree in computer science, our data science degree will emphasize many of the “algorithmic components” of a traditional computer science degree (algorithms, data structures, programming, data management, software engineering), but will be combined with a large number of courses in statistics and machine learning (to a much more significant degree than would be in a traditional computer science degree).

A question we sometimes get is “how is the data science major different from doing a major in computer science and a minor in statistics” (or vice versa)? The answer is that the course requirements of a typical computer science (or statistics) major are demanding enough that students typically can only sample a relatively small number of courses in the “other” major. In contrast, the course load of the data science major is spread roughly 50-50 across courses from computing and statistics (for required courses), ensuring students are immersed in the cultures of both disciplines. One could also ask “how is the proposed major different from a degree in computational statistics?” One answer to this is “perhaps not that different”—but by engaging both statistics and computer science in the design and teaching of our data science degree program, we will be able to encourage students to both “think like a statistician” and “think like a computer scientist,” rather than thinking like one or the other—which we hope will give them a useful perspective when faced with challenging data analysis problems over the course of their careers.

## Winona State University

## Data Science

**Year in which first students expected to graduate: ** Fall 2015

**Number of students currently enrolled:** 10–15

**Partnering departments:** Mathematics and Statistics and Computer Science

Chris Maloneis a professor of statistics and data science at Winona State University who has been pivotal in the development and improvement of curriculum at the undergraduate level for more than 15 years.

Brant Deppais professor of statistics and data science at Winona State University. Deppa has spent more than 20 years developing dynamic and modern undergraduate curriculum for statistics. He serves as the chair for the department of mathematics and statistics.

Narayan Debnathis a professor of computer science at Winona State University. He has made substantial contributions to the computer science program for more than 20 years and is the chair of the department of computer science.

### How do you view the relationship between statistics and data science?

Data science is not statistics. Data science is the melding of computer science and statistics. There is a sense of friction between these two disciplines, which is understandable as ownership of this new discipline is at stake. This friction likely stems from academic departments that are frantically developing curriculum before another department or by one segment of a company that gains approval to hire a data scientist before another segment. The development of a successful data science program requires departments to reduce their differences so that an appropriate balance in curriculum can be achieved.

There are several institutions offering data science degree programs that are housed in computer science. The data science program at Winona State University is housed in the Department of Mathematics and Statistics, but includes a set of computer science courses as a major component of the curriculum. A data science major should be comprised of curriculum from computer science, statistics, and supporting courses from other departments as the practice of doing data science requires discipline-specific knowledge.

### What are the basic elements of your data science curriculum, and how was it developed?

The data science program at Winona State consists of three components: analytical knowledge, computational knowledge, and a discipline-specific area of application. Analytical knowledge consists mostly of content from data science (9 credits) and statistics (9 credits). Computational knowledge is comprised of 14 credits from computer science. The discipline-specific component requires 12 credits of upper-division coursework from any discipline with the exceptions of data science, statistics, and computer science.

The development of this program began with a poster at United States Conference on Statistics 2011. This development continued with posters and presentations at JSM 2012 and JSM 2013. The undergraduate data science programs at the College of Charleston (which developed the first undergraduate data science in the U.S.) and the University of Warwick UK were influential in the development of the data science program at Winona State.

### What was your primary motivation(s) for developing an undergraduate data science program? What’s been the reaction from students so far?

Motivation for our program stemmed from the most recent program review in statistics. A substantial proportion of our statistics majors attend graduate programs upon completion of their degree. The program review highlighted that additional focus should be placed on students seeking employment. In the summer of 2012, several working professionals and hiring managers were interviewed by a core group of faculty at Winona State. This process revealed that the skill set being sought could not be attained by simply tweaking our existing undergraduate statistics program. Thus, the decision was made to create a separate data science program.

Students were excited about the launch of this new program. The launch of our program has received considerable attention from faculty and administrators across our campus and system-wide personnel. Employers in the region were invited to take part in the launch of this new program. The data science program at Winona State will significantly enhance employment opportunities for our students.

### Describe the reception you received from the partnering departments, other departments, and those at the university who had to approve the program.

Collaboration with the department of computer science was necessary for the development of our program. This collaboration extended beyond the determination of which computer science courses to incorporate and was vital to finding the correct balance between the three components of the program. The collaboration between the departments was sincere and not forced upon us by administrators.

There was little resistance to the approval of the data science program at Winona State University. The submitted curriculum package was thorough and the motivation and need were clearly defined. One obstacle was that curriculum package required information regarding the anticipated employment opportunities for graduates. This proved to be somewhat difficult as employment potential could not be directly connected to a data science degree as there is currently no CIPS code for data science.

### What advice do you have for students considering a data science degree versus a computer science degree, statistics degree, another degree, or some combination of the above (e.g., a double major of statistics and computer science)?

The data science and statistics programs at Winona State University teach students how to extract information from data. These degree programs have differences in the types of data being considered, the procedures and methods in which information is extracted, and what inferences can be made from data.

We strongly encourage our students who plan to attend graduate school in statistics to double major in statistics and mathematics. A student who is seeking employment as a data analyst upon graduation should consider a degree in data science over statistics.

The data science program at Winona State has very strong connections with computer science. Thus, computer science majors are encouraged to consider a major or minor in data science.

## University of Nottingham

## BSc (Hons) Data Science

**Year in which first students expected to graduate:** 2018

**Number of students currently enrolled:** 10–15 are expected to start

**Partnering departments:** School of Computer Science and School of Mathematical Sciences. Computer Science is the host school, although it is an equal partnership.

Uwe Aickelinis a professor and head of the school of computer science at the University of Nottingham. His research interests are data modeling and analysis in the health, security, and digital economy domains using artificial intelligence and machine learning methodologies.

Ian Drydenishead of the school of mathematical sciences at the University of Nottingham and is a professor of statistics. His research interests include shape analysis, object data analysis, medical image analysis, and high-dimensional data analysis.

David Hodgeis the course director in mathematical sciences for data science. He is a lecturer in the probability and statistics group at the University of Nottingham and works in areas related to mathematics of operations research.

Christian Wagneris the computer science admission tutor. He is an associate professor and researches the capture, modeling, interpretation, and processing/aggregation of uncertain data in particular in multidisciplinary contexts.

### How do you view the relationship between statistics and data science?

We see data science as the newborn sister in the family of mathematical and computational sciences, with statistics being a much older sibling. Clearly, there is much excitement about the new arrival, which has generated a lot of ideas and attention, but there is also plenty to learn from older members of the family.

### What are the basic elements of your data science curriculum, and how was it developed?

This course is one of the first undergraduate courses of its kind and covers an integrated portfolio of topics unique to Nottingham—including data capture, data mining, statistical analysis, machine learning, and large-scale cloud computing—while teaching an understanding of the human issues surrounding the analysis of personal data. It produces graduates with the core statistical and computer science knowledge and skills needed to present, analyze, and ultimately understand large data sets in an ethical manner.

The course content is split equally between mathematical sciences and computer science modules with an emphasis on statistical and computational data analysis methods, many the result of work in artificial intelligence. The curriculum in year 1 consists of computer science fundamentals, mathematics (linear algebra, calculus, analytical and computational foundations), programming and algorithms, databases and interfaces, probability, and statistics. Years 2 and 3 then offer advanced modules in statistics and computer science (e.g., further probability and inference, machine learning, and operations research).

The schools of mathematical sciences and computer science have long offered joint honors courses, primarily in pure mathematics and theoretical computer science. Times have changed, though, and it seemed the right time to focus on data science, given the strong market demand for the skills of the graduates and the broad interdisciplinary expertise in our schools.

We have strong research connections between the two schools, particularly in the analysis of highly complex data in the Statistics and Probability Research Group in mathematical sciences, the Intelligent Modelling and Analysis Group in computer science, and the Horizon Digital Economy Research Institute. We have a strong role in the University of Nottingham’s Big Data Initiative, and we are jointly leading its new data-driven discovery research priority area, so our partnership is a natural one.

With these unique teaching and research strengths from the schools of computer science and mathematical sciences, students can expect to gain the knowledge and skills needed to present, analyze, and ultimately understand large and complex data sets. These are supported by a strong software development theme, providing the competency needed to understand and apply key techniques. The final-year project provides an opportunity to bring these complementary capabilities together to address real data analysis problems in a rich and supportive environment.

The course will start later this year, so it is a bit early for a student reaction. However, it is clear from our existing courses that students are voting with their feet and demand for data analysis options, projects, and dissertations is very high.

Everyone has been very positive, clearly recognizing how important it is to train the next generation of data scientists.

### What advice do you have for students considering a data science degree versus a computer science degree, a statistics degree, another degree, or some combination of the above (e.g., a double major of statistics and computer science)?

Students are often conservative when choosing degree courses, with a natural tendency to choose a familiar subject that they took in high school. However, we would recommend students think further, considering which degrees will lead them to exciting employment opportunities. This is happening more and more, but given the immense demand in data science, it will be some time until the number of graduates from degrees like ours becomes anywhere near close to meeting the market’s requirements.

## Warwick Data Science Program

## BSc in Data Science

**Year in which first students expected to graduate:** 2017

**Number of students currently enrolled:** 7*

**Partnering departments: **Statistics, Computer Science (with Statistics as lead department)

** The first cohort of seven students entered the program in September 2014. We expect this number to grow over subsequent intakes as awareness of the program increases. Meanwhile, we have a small but highly motivated inaugural cohort.*

David Firthis professor of statistics and director of the Warwick Data Science Institute. He worked previously at the universities of Oxford and Southampton, Imperial College London, and the University of Texas at Austin.

Anthony Leeis assistant professor of statistics and program director for data science in the statistics department. He has BSc and MSc degrees in computer science from the University of British Columbia and a DPhil in statistics from Oxford.

Graham Cormodeis professor of computer science and program director for data science in the computer science department. Previously he was a researcher at Bell Labs and AT&T Labs.

### How do you view the relationship between statistics and data science?

The new undergraduate program in data science at Warwick complements two long-established programs organized by the department of statistics: MORSE (mathematics, operational research, statistics, and economics) and MathStat (mathematics and statistics). Together, those two existing programs graduate 160–180 students a year—a large group of “statistics majors,” almost half of whom come from outside the UK.

Our data science degree program builds on this existing strength to diversify the statistics undergraduate provision further in the specific direction of computer science. Like the more established statistics programs at Warwick, BSc Data Science is designed for students who are motivated to acquire deep, mathematically based skills that are directly relevant in the modern world. Statistical theory and methods are absolutely central in Warwick’s data science curriculum, as are the relevant parts of computer science and mathematics.

### What are the basic elements of your data science curriculum, and how was it developed?

The basic elements of Warwick’s three-year BSc in data science are the following:

- Rigorous foundations in relevant areas of mathematics such as analysis and linear algebra
- Foundational modules in computer science (programming, information structures, algorithms, etc.), in statistics (probability, statistical computing, statistical theory, linear models, stochastic processes, etc.), and in management science (linear programming, etc.)
- A wide range of options in the third year in computer science, statistics, and more broadly
- In the third year, a major data science project (25% of final-year credit) designed to give each student a personalized experience of data-analytic consulting and/or research

The program was developed during 2012–2013 by a working group of faculty members from the statistics and computer science departments. Essential components of the development process were to identify which existing modules in statistics and computer science would be most relevant to the new program and to plan additional modules needed to complete a well-rounded data science curriculum (such as advanced use of R for computational problems and the final-year project).

Our main motivations were the following:

- Clear external signals seen over several years of the large and still-growing demand for graduates who are highly skilled in both statistics and computer science.
- Correspondingly clear demands from students in our existing MORSE and MathStat degree programs, for additional higher-level program modules in areas such as programming, databases, machine learning, etc. (Our students, too, had noticed that such computing skills when combined with their statistical knowledge would make them even more attractive to major employers!)
- To diversify our existing portfolio of undergraduate courses in an interesting new direction.

Current Warwick students gave us positive feedback on plans for this new program (some wished it had been available when they enrolled at the university). The attractiveness of a new degree program to potential applicants inevitably builds slowly over a period of years, though. Branding the program as data science, rather than a more familiar combination of statistics and computer science is a calculated risk, as this has limited name recognition from high-school students and business. Warwick faced this before in the 1970s when the MORSE degree was introduced and the rather radical idea of “BSc in MORSE” was unfamiliar to employers. MORSE grew steadily over time and now has 150–160 graduates each year—putting it among the largest statistics degree programs in the world. We hope data science will follow a similar trajectory.

The program proposal was joint between statistics and computer science, and it was enthusiastically approved by the university. The initial suggestion came from statistics, and its merits were appreciated immediately by computer science colleagues; the two departments collaborated closely over 4–5 months to prepare the formal proposal for university-level approval. Although the program is formally led from the statistics department, it has a named program director in both departments and computer science leads the development and organization of the third-year projects that form such an important part of each data science student’s experience at Warwick.

Warwick does not offer a straight statistics degree at undergraduate level, in part because the subject is not always well regarded by high-school students. In the UK, students apply directly for admission to a named degree program, rather than declaring a major later on. The main competitor programs to data science for prospective students are MathStat and MORSE, as mentioned above, along with computer science and discrete mathematics. (Discrete mathematics is a joint BSc program run by computer science and mathematics at Warwick.)

We aim to be clear, through our web pages and open-day presentations to prospective students, what the differences are between these various programs that have (necessarily) overlapping core material. Indeed, the first item on the data science FAQ page is “Why would I choose data science, rather than MathStat, MORSE, mathematics, or computer science?” Our answers to such questions typically aim to inform, rather than persuade. Above all, we want our students to follow a program they know is right for them!

Steve Piersonsaid:The August issue will feature three more programs:

Miami University Analytics Co-Major

Ohio State University Data Analytics Major

University of Michigan Major in Data Science

williamssaid:Could any provide me guide regarding Marketing course details…