Home » Additional Features

Two More Schools Create Master’s, Doctoral Data Science/Analytics Programs

1 March 2020 701 views No Comment
The proliferation of master’s and doctoral programs in data science and analytics continues, seemingly due to the insatiable demand of employers for data scientists. Amstat News started reaching out several years ago to those in the statistical community who are involved in such programs to find out more. Given their interdisciplinary nature, we identified programs involving faculty with expertise in different disciplines—including statistics, given its foundational role in data science—to jointly reply to our questions.

WASHINGTON UNIVERSITY SCHOOL OF MEDICINE

Charles Gu is an associate professor in the division of biostatistics and an associate professor of genetics at Washington University. His research interests include high-dimensional data analysis, maching learning, and translational bioinformatics. He is the course master for a bioinformatics course.

 

Lei Liu is a professor in the division of biostatistics at Washington University. His biostatistical and data science interests include survival analysis, longitudinal data analysis, spline regression, personalized medicine, and machine learning. He is the course master for a biomedical data mining course.

 

Master of Science in Biostatistics and Data Science (MSBDS)
Year in which first students graduated: 2019
Number of students enrolled: 3
Partnering departments: Division of Biostatistics
The MSBDS program is housed in the School of Medicine and is a 42-credit, 18-month program with summer matriculation in July. The majority of students are full-time traditional, though it is also open to research staff on campus who can only enroll part time. Students choose between a 6-credit (final two semesters) internship or thesis. All students are eligible to apply for research assistantship positions after the first summer semester.

Describe the basic elements of your data science/analytics curriculum and how the curriculum was developed.
Our MSBDS program comprises three important data science training elements: a one-semester course on introduction to bioinformatics; two one-semester courses on biomedical informatics (fundamentals and methods); and a one-semester course on biomedical data mining.

We expect our matriculating cohorts will have a balanced undergraduate training in math, statistics, and computer programming. However, it’s our experience that—almost without exception—students have a rather heterogeneous mixture of the skills, especially in computing and statistics. To address this issue, students enroll in three summer courses following orientation: two courses on statistical computing (SAS and R) and one on biostatistics. This way, all will be at a relatively similar starting point when they start the fall semester.

The curriculum has evolved from our earlier Master of Science in Biostatistics (MSIBS) program established in 2011, which evolved from our original Genetic Epidemiology Masters of Science (GEMS) launched in 2002. Bioinformatics training was included from the beginning, created in response to the rising tide of “big data” of genomics. The new Master of Science in Biostatistics and Data Science (MSBDS) degree was built upon the other two programs (GEMS and MSIBS) by incorporating some of their foundational courses.

What was your primary motivation for developing a master’s data science/analytics program? What’s been the reaction from students so far?
We were motivated to respond to the demands of our students, their employers (many of whom are also our colleagues), and our faculty who all have more frequently encountered emerging big data needs. For example, many students working as RAs often have to deal with some kind of genomics/bioinformatics data analysis, even though they have had limited training. By adding the two biomedical informatics courses and a biomedical data mining course to the already existing introduction to bioinformatics and fundamentals of genetic epidemiology courses, we believe students will be more well-rounded and able to tackle data science work related to biomedical research. In practice, master’s students in biostatistics may need less coursework in statistical methods but more coursework in data management.

How do you view the relationship between statistics and data science/analytics?
Statistics is to data science like a parent is to his/her child: You see a lot of resemblance of the parent in the child, but the child will live his/her own life. It takes another parent to bring the child to life and, in this case, it actually took two or more (i.e., computer science and a domain science). Just like several other scientific branches, traditional statistical methods have been developed to analyze data of a smaller scale to achieve a balance of accuracy and efficiency. However, with the emergence of bigger and more complicated data, new statistical and computational approaches are needed to answer the call. Big data and small data, are all part of data science. Statistics should go hand in hand with informatics, computer science, and other disciplines to make a big world of data science.

What types of jobs are you preparing your graduates for?
We aim to prepare our students for a variety of biomedical data science–related jobs and anticipate many will continue to find jobs in academic research or continue with further education. However, we would like to see more graduates secure positions in industry. In addition to (bio)statisticians, students can also find jobs as data analysts, bioinformatician, and business analysts. If the students’ research assistantship opportunities are any indication, the constant and growing demand for biomedical research assistance leads us to believe the future looks bright for our graduates.

What advice do you have for students considering a data science/analytics degree?
Start early and think over the nature and type of data science career you will both enjoy and are good at doing. The nature of your ideal DS jobs is related to the domain science component of your MSBDS degree. The earlier you think these over, the sooner you can decide on and take relevant classes at the undergraduate level. In any case, take some classes to hone your computing skills.

Regarding a degree in data science versus computer science, statistics, or a domain science, the primary determining factor should be your intellectual interest and capability. At the risk of being overly simplistic, the major difference between CS and DS is whether the model is assumed known, between statistics and DS is the scale of data, and between a domain science and DS is whether you create your own data or analyze others’.

Essentially, data science graduates should have more knowledge in informatics and data management compared to the traditional biostatistics master’s students and more coursework in statistical methodology than CS students.

Describe the employer demand for your graduates/students.
As previously stated, we foresee more students getting jobs in industry and also believe there will continue to be strong demand for graduates in academic research. The program will also prepare graduates for higher studies (PhD, MD, etc.).

Do you have any advice for institutions considering the establishment of such a degree?
It is the prime time to get into the field and start one’s own program in DS. As a new science, there are tremendous amounts of theoretical and methodological problems that require persistent work of many great minds. It is a good time for a top-tier school to make meaningful contributions to the development of this new field. To do so, the school must not simply see this (creating such a DS degree) as a new revenue source, but rather as an opportunity to advance the science. Therefore, the creation of a new DS program must be accompanied by enhanced/strengthened faculty activities devoted to DS research and teaching.

It is also a good time to create (or start to create) institution-wide databanks. DS is a discipline that relies and thrives on other people’s data.

Revamping university-wide curriculum for a DS degree may seem an overkill. However, it may be necessary to organize a campus-wide curriculum taskforce to streamline and coordinate course offerings between different schools and departments to eliminate waste of efforts and fully use strengths of different programs.

BOWLING GREEN STATE UNIVERSITY

Robert Green is an associate professor of computer and data science. He served as a research assistant professor at the University of Toledo from 2012–2013, before joining the department of computer science. While his core research expertise is in computational intelligence and high-performance computing, his research record crosses disciplinary boundaries with publications in cloud computing, power system reliability, intrusion detection, and optimization.

 

PhD in Data Science, MS in Data Science
Year in which first students are expected to graduate: PhD in Spring 2022, MS in Fall 2021
Number of students enrolled: 11
Partnering departments: Computer Science, Mathematics and Statistics, Applied Statistics and Operations Research
Program format: In person
The MS is 30 credit hours with a required project. The PhD has a 60- and 90-credit hour path, including qualifying and preliminary exams. Practicum of some type (industrial or research) is required. The 90-credit hour path includes earning an MS degree.

Describe the basic elements of your data science/analytics curriculum and how the curriculum was developed.
The graduate college, along with the three partnering departments, jointly developed the curriculum as informed by faculty, research, and the respective advisory boards of each department. We consulted with data science practitioners in industry and with Burtch Works regarding their study of data science. Information was also gathered from the MS in analytics program at BGSU. The curriculum is fundamentally 33 percent applied statistics, 33 percent math/statistics, and 33 percent computer science.

Students entering the program are expected to have the following background:

  • Differential, integral, and multivariate calculus
  • Linear algebra
  • Senior-level introduction to probability
  • Senior-level statistics
  • Programming skills in high-level languages such as C, C++, Java, and Python
  • Data structures
  • Algorithms
  • Computer science knowledge

What was your primary motivation for developing a master’s and doctoral data science/analytics program? What’s been the reaction from students so far?
Fundamentally, there is a significant need for the program both from an industrial and academic perspective. A variety of sources have been consulted with regarding the market need for this program. According to Fortune, Indeed.com’s chief economist, Tara Sinclair, said the number of job postings for data scientist grew 57 percent for the first quarter of 2015 compared to the year-ago quarter. And searches for data scientists grew 73.5 percent for the same period. A search for PhD data scientist positions on January 24, 2018, resulted in 4,646 positions. On the same day, a search for PhD statistics positions only resulted in 3,857 positions. (A search for PhD computer science position resulted in 8,517 positions.)

A report by International Data Corporation in 2015 observed the following potential for big data analytics and the need to analytics professionals:

  • Shortage of skilled staff will persist. In the US alone, there will be 181,000 deep analytics roles in 2018 and five times that many positions requiring related skills in data management and interpretation.
  • Over the next five years, spending on cloud-based big data and analytics solutions will grow three times faster than spending for on-premise solutions.
  • Adoption of technology to continuously analyze streams of events will accelerate as it is applied to Internet of Things (IoT) analytics.

A report, The Burtch Works Study: Salaries of Data Scientists, by the Burtch Works Executive Recruiting was released in April 2016. The report was based on a sample of 374 data scientists for the 12-month period ending in March 2016. The report has the following key findings:

  • Ninety-two percent of the data scientists in the sample have an advanced degree; 44 percent hold a master’s degree, and 48 percent hold a PhD.
  • The median salary of an entry-level job for a data scientist with a PhD is $100,000.
  • Demand for data scientists has been increasing as more organizations jump on board the data bandwagon, and while the supply has been improving, it still lags far behind.

Student reactions so far have been strongly positive. Our students have mixed backgrounds, with a majority of them coming from a strong statistics background. These students tend to struggle in their first year getting up to speed on computer science–related work. The opposite is true of those with strong computer science backgrounds. There are few candidates who have a well-balanced background covering all areas, though the department of computer science will begin offering an undergraduate specialization in computational data science in fall 2020 that will do just this.

How do you view the relationship between statistics and data science/analytics?
Statistics is essential and foundational in data science, but it is not the entire picture. The data scientist must have a mixed skill set in math, statistics, and computer science to succeed. A statistics PhD student with a few CS courses or some experience with R is not automatically a data scientist; one needs deeper knowledge and experience with computer science to develop one’s own algorithms, make them run fast, and understand the ecosystem of computer science to be able to get the code deployed and running without interruption.

What types of jobs are you preparing your graduates for?
While we have not had any graduates yet, we assume our PhD students will move on to either academia and industry and that our MS students will go to industry or into a PhD program. Master’s graduates who go into industry will likely take jobs that have titles such as data scientist and data engineer.

What advice do you have for students considering a data science/analytics degree?
For true data science, you need to be very strong in one area of math/statistics, applied statistics, and computer science and competent in the other two. You also need to be ready to collaborate. Every data scientist has their own expertise and is stronger in one of these three areas, so collaborative and complementarity are adjectives we seek to maintain.

What many students find difficult about studying data science is that you need to embrace two very different disciplines with different mindsets—statistics with its mathematical nature and rigorous thinking about sampling and testing and computer science with its fast pace, the need to be comfortable with everything happening on a computer, discrete puzzle-like tasks, and the need to collaborate with Git and other such tools.

Do you have any advice for institutions considering the establishment of such a degree?
Be kind and realize your discipline is not the “end all be all” of disciplines. We all complement each other and bring important skills to the table.

Begin the process with the creation of a college, school, or department to administratively house the programs.

Work hard to actively include people on the core team from multiple departments, and even from multiple colleges. The program will be stronger for it.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.