Home » A Statistician's Life

Q&A with UNL’s Director of Computational Sciences Initiative

1 March 2015 672 views No Comment

Jennifer Clarke

Jennifer Clarke is an associate professor in food science and technology and statistics and director of the Computational Sciences Initiative at the University of Nebraska Lincoln. Her interests include statistical methodology for metagenomics and prediction and training data scientists.

Please describe your position and responsibilities.

Formally, I am an associate professor in the department of statistics and the department of food science and technology at the University of Nebraska Lincoln (UNL). I have the usual responsibilities that associate professors have—maintaining a research program (mine focuses on statistical metagenomics and statistical prediction), teaching, and advising graduate students. However, what’s novel about my position is that I’m director of the Computational Sciences Initiative (CSI) at UNL.

The CSI—despite its unfortunate acronym—is a university-wide, faculty-driven program to enable and develop resources for Big Data and data science, with an emphasis on the life sciences. I started in this position in August 2013.

The CSI is supported by a chancellor’s program of excellence award that provides funding for faculty and postdoctoral researchers, staff support, hardware/software, and seed grants in research areas relevant to the data sciences, with the expectation that CSI will acquire other sources of funding as it develops.

The goal is to establish cross-campus linkages via data science that will conduct research, training, and consulting with both academic and industry partners. UNL has three campuses—City Campus (the main campus), East Campus (home of the Institute for Agriculture and Natural Resources, or IANR), and the Nebraska Innovation Campus (our newest campus focused on academic-industry partnerships).

The CSI directed the hire of two new faculty members in the IANR for the 2013–2014 academic year (one in comparative genomics and one in mathematical modeling) and anticipates an additional five new hires for the 2015–2016 academic year (in Bayes spatial-temporal analysis, agricultural information systems, remote sensing, statistical prediction, and spatial economics).

We have a working group of approximately 40 faculty and postdoctoral researchers who meet monthly to discuss data challenges in disparate, interdisciplinary fields ranging from computational biology to social sciences to sustainable and reliable food systems.

What is your background, and what do you think most qualified you for this position?

I have two undergraduate degrees, one in mathematics and one in psychology, from Skidmore College. I have an MS in statistics from Carnegie Mellon University and a PhD in statistics from The Pennsylvania State University.

After I completed my doctorate in 2000, I was a postdoctoral researcher at the National Institute for Statistical Sciences (NISS) on a project with GlaxoSmithKline and a visiting assistant professor in the department of statistical sciences at Duke University. In 2004, I became a research assistant professor in the department of biostatistics and bioinformatics at Duke and received a National Institutes of Health K25 training award from the National Cancer Institute focused on statistical methodology for high-dimensional genomic data. In 2007, I moved to the University of Miami, where I was an assistant and then associate professor in the division of biostatistics. I was recruited in 2013 to UNL. This was a circuitous path, but it gave me the right sort of experience for what I’m doing now. I would summarize this as:

  • Excellent graduate and post-graduate training in statistics, both Bayesian and frequentist, with a strong emphasis on computation
  • Postdoctoral training on multidisciplinary research projects and a training grant that involved statistics, computation, cell biology, oncology, and genetics
  • Faculty positions at schools of medicine and colleges of arts and sciences (which have very different intellectual cultures)
  • International scholarly experiences, both inside and outside the western world
  • A love of learning (a never-ending endeavor) and of statistics, both as an intellectual field and as a powerful, enabling field for students and researchers from other disciplines

No, you don’t need to follow my winding professional road to be qualified for a position such as mine, so take a deep breath. What you do need is excellent training, both in statistics and in a collaborative field, and a willingness to learn and listen. Oh yes, and a determination to succeed!

Please describe a few of your current projects and what they entail.

This is the fun part. Here are three of my current projects:

  1. We just started a PhD program in complex biosystems, which is recruiting students for the fall of 2015. It is directed at students interested in quantitative statistical and computational approaches to data acquisition and analysis in multiple areas of biology.

    The program is interdisciplinary, so graduate students are recruited through a common portal, and the program affords a full year of research rotations (three total) on diverse topics. In the first year of study, the students consider “big questions” in multiple areas of life sciences and learn current technical and analytical approaches to answer them—as well as open challenges. The goal is that students acquire a foundation in population, cellular, and molecular life sciences; statistics; bioinformatics; and computational analysis. The advantages to the student are the wide choices available for research projects and an interdisciplinary educational approach that allows students to see cutting-edge methodologies at the beginning of their studies.

  2. I am developing novel statistical approaches to metagenomic data, with a focus on bacterial identification and community description. For those of you not familiar with metagenomics, it is an approach to analyzing complex microbial communities based on genomic data collected directly from mixed microbial DNA. We have developed two approaches to detect the presence or absence of microbes in a sample, one Bayesian and one frequentist. Currently, we are developing novel approaches to clustering such data (both genomic and abundance) based on ensembling.
  3. The CSI is involved in the construction and development of an advanced plant phenotyping facility at UNL. This is a big deal because, as a land grant institution in an agricultural state, we want an environment conducive to crossdisciplinary research among plant breeding, genetics, metabolic engineering, physiology, stress biology, and statistical and computational modeling, along with optical and hyperspectral imagery capture and analyses. From a statistical standpoint, one focus will be on the analysis of and accurate prediction from multitype data: The phenotyping platforms will collect not only imaging data, but also allow data capture on carbon/water flux, photosynthetic capacity, leaf area, and soil measurements. The challenge is to bridge the genotype to phenotype gap at the plant level. To a statistician, this translates into the development of novel techniques to predict plant phenotype from both genomic and environmental data that may be collected over time. The CSI is involved in this project on two fronts: one is the development of statistical methodology and one is the development of a data management and access pipeline for plant phenotyping.

How unique is your position in academia, and what motivated UNL to create it?

This position is unique in some ways, but common in others. There are many faculty members in academia whose research is focused on applied statistics, or statistical computation, and who work in collaborative environments with nonstatisticians.

These people make essential contributions, but rarely have a significant voice outside of their immediate group.

My position is different in that I receive university support to advance statistics and the quantitative sciences, in both education and research, with the expectation that I will bring needed expertise and develop needed resources at UNL. This position was developed because the faculty, particularly those in the life sciences, recognized that statistics, bioinformatics, and computation were critical to the future of their research and to the training of their students.

Several years ago, a laboratory would have simply hired a postdoctoral researcher with a quantitative background to enable research and help in training students. Now, however, with the explosion in the amount of available data and the required analytical skills, this is no longer enough. The faculty and administration at UNL decided to develop a central hub of faculty, postdoctoral researchers, and technicians, who could either provide methodological and analytical expertise directly or link faculty and students to appropriate resources. This is the CSI.

Any advice for other universities considering such a position? Any advice for statisticians considering a position like yours (or trying to facilitate more interdisciplinary partnerships in their university)?

My advice for other universities would be to do the same. Many fields are in the process of becoming more quantitative, and this trend will continue. The most successful universities will be the ones that provide institutional support to quantitative fields, tailored to their own needs and strengths. Statistics as a field is growing and will continue to grow with our ability to collect data and our desire to make evidence-based decisions, and, along with this growth, comes other needed skills from fields such as computer science, electrical engineering, bioinformatics and computational biology, and mathematics. It is often daunting for faculty from traditionally nonquantitative fields to find the quantitative resources they need, let alone collaborate with quantitative faculty! The same is true for potential industry partners who are looking for a quantitative ‘point person’ on campus.

For statisticians considering a position like mine, I hesitate to give advice, as the world for statisticians is changing rapidly. However, here are a few highlights:

  • Be a statistician, first and foremost. This can be a hard thing to do, particularly in interdisciplinary settings, where either (1) proper statistical methods may not be discussed or of concern to your colleagues or (2) a certain level of knowledge of another field is assumed (and you don’t have it). So speak up when you have a concern, suggestion, or question. Collaborations worth pursuing will allow this, and if they don’t, maybe they are not worth pursuing.
  • Be willing to learn the ‘languages’ of other fields. In practice, effective communication with a nonstatistician requires learning by both parties—you learn to express yourself in a way accessible to other scholars and professionals, and they learn how to express themselves more precisely and quantitatively. This is difficult, but it can be extremely rewarding when it works.
  • Develop your computational skills. Often, simply accessing data and preparing it for data analysis requires considerable computational acumen—let alone doing the desired analyses. This may require courses in statistical computation and packages, as well as courses in computational languages that are not statistical.

How do you think the general area of ‘data sciences’ will affect the field of statistics over the next five years? What branches of the sciences are playing a growing role in interdisciplinary statistics?

The general area of data sciences will have a huge positive impact on the field of statistics. Why do I say this? I define data science as a three-legged stool, the legs being computer science, statistics, and a subject matter field (a popular one is business). No two of these fields can be successful in data science without the third. I see an increasing number of students who describe their interest as data science and understand this to include a solid background in statistics. In the 2000s, I often had to convince students interested in what we now refer to as data science that statistics was important. This has reversed: Now I often have to find resources to meet the demand for statistics.

In terms of branches of sciences that are pushing a growth in interdisciplinary statistics, I will mention a few and my apologies for those I omit. At the top of my list are (1) biological sciences (with genomic data as well as imaging and phenotypic data), (2) agricultural sciences (with plant breeding and remote sensing), (3) business (with decision analytics, Internet surveys, and Big Data applications), (4) earth and atmospheric sciences (with astrophysics, global imaging, and climate change), and (5) ecology (with metagenomics, spatial-temporal contexts, and environmental sciences). All these areas have challenging data for statisticians and people with a willingness to work collaboratively.

What would you like to accomplish in the next several years to consider the creation of this position a success?

I would like to develop CSI into a cross-campus hub for data sciences. We are supporting two bioinformaticians and are hiring an assistant professor of practice and a postdoctoral researcher. I would like to see this group have an impact on the university in terms of educating students (workshops, short courses, and lectures on topics in data science) and furthering its research mission. I would like the CSI and its website to become a ‘go-to’ place for students and faculty who are interested in the data sciences.

We have hosted several events on campus focused on Big Data or the data sciences, with both external and internal speakers, and I would like to support future events. The creation of this position will be a success if, in the next few years, (1) faculty researchers and graduate students will have found major direct benefits from quantitative support; (2) students will have found the CSI to be a frequently accessed resource for education and guidance in the data sciences; (3) the administration considers the CSI to be a critical resource for research, education, and industry relations; and (4) I am satisfied we have the resources necessary to support statistics, computation, and the data sciences at UNL so research projects that were essentially infeasible when I started here become routine.

If you were not a statistician, you would be …?

Earlier in my life, I wanted to become a professional ski racer, but I simply didn’t have enough athletic talent. In my wildest dreams, I am a formula 1 race car driver (ask my friends about my love of cars). But being more realistic, I likely would have become a struggling artist or run my own auto shop. I can’t tell you what these have in common, except my enthusiasm for them.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.