Home » Featured

Training Students to Extract Value from Big Data

1 December 2014 No Comment
Michelle Schwalbe
    Workshop co-organizers Raghu Ramakrishnan of Microsoft and John Lafferty from The University of Chicago discuss what is meant by the analysis of Big Data.

    Workshop co-organizers Raghu Ramakrishnan of Microsoft and John Lafferty from The University of Chicago discuss what is meant by the analysis of Big Data.

    The workshop report Training Students to Extract Value from Big Data from the National Research Council’s Committee on Applied and Theoretical Statistics (CATS) is available for free download.

    Data sets—whether in science and engineering, economics, health care, public policy, or business—have been growing rapidly. The recent influential NRC report Frontiers in Massive Data Analysis by CATS and the Board on Mathematical Sciences and their Applications documented the rise of Big Data, as systems are routinely returning terabytes, petabytes, or more information. The size and scale of data, which can be overwhelming today, are only increasing. In addition, data sets are increasingly complex, and this potentially increases the problems associated with such concerns as missing information and other quality concerns, data heterogeneity, and differing data formats.

    A key challenge is to develop the experts needed to draw reliable inferences from large and complex data sets. The nation’s ability to make use of the data depends heavily on the availability of a properly trained work force. It is important to increase the pool of qualified scientists and engineers who can extract value from Big Data.

    The CATS Training Students to Extract Value from Big Data workshop took place from April 11–12 in Washington, DC, and explored the need for training in Big Data (through experiences and case studies); principles for working with Big Data; courses, curricula, and interdisciplinary programs; and shared resources.

    As discussed by the workshop participants, training students to be capable in exploiting Big Data requires experience with statistical analysis, machine learning, and computational infrastructure that permits the real problems associated with massive data to be revealed and, ultimately, addressed. The availability of repositories (of both data and software) and computational infrastructure will be necessary to train the next generation of data scientists. Analysis of Big Data requires cross-disciplinary skills, including the ability to make modeling decisions while balancing trade-offs between optimization and approximation, all while being attentive to useful metrics and system robustness. To develop these skills in students, it is important to identify whom to teach, that is, the educational background, experience, and characteristics of a prospective data science student; what to teach, that is, the technical and practical content that should be taught to the student; and how to teach, that is, the structure and organization of a data science program.

    One impetus for the workshop was the current fragmented view of what is meant by analysis of Big Data, data analytics, or data science. New graduate programs are introduced regularly, and they have their own notions of what is meant by those terms and, most important, of what students need to know to be proficient in data-intensive work. What are the core subjects in data science? It is clear that training in Big Data, data science, or data analytics requires a multidisciplinary foundation that includes at least computer science, machine learning, statistics, and mathematics and that these disciplines should work together to develop curriculum for these training programs.

    The topic of training students in Big Data is timely, as universities are already experimenting with courses and programs tailored to the needs of students who will work with Big Data. The workshop was designed to enable participants to learn and benefit from emerging insights while innovation in education is ongoing.

    For more information about this workshop and other CATS activities, contact Michelle Schwalbe at mschwalbe@nas.edu.

    Visit the website for videos and presentations from the workshop.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

    Comments are closed.