Home » A Statistician's Life, Featured

Four Students Talk Fellowships

1 November 2013 798 views No Comment

This past summer, 36 undergraduate and graduate students spent three months at The University of Chicago as the first fellows of the Eric and Wendy Schmidt Data Science for Social Good program, which allows aspiring data scientists to work on data mining, machine learning, Big Data, and data science projects with social impact. All the fellows used their coding and analytics skills to take on real-world problems in education, health, energy, and transportation for government and nonprofit sponsors. Four of the fellows offer advice and respond to questions about their experiences, views on data science, and future plans.

varoonwpVaroon Bashyakarla earned his bachelor’s degree from Yale, where he studied statistics and economics, spearheaded cancer fundraising and awareness efforts, and played on the men’s club tennis team. He hopes to pursue a career as both a scholar and activist.

 

How are you spending your summer as a DSSG fellow?

I’m working on a project to optimize some of the Chicago Police Department’s predictive algorithms. My teammates and I are mining vast troves of data to detect salient features that may possess crime-predictive capacity, and we are constructing spatiotemporal point process tests to investigate the strength of the associations we find.

What inspired you to apply?

I applied for this fellowship because I am—and always have been—a people person; because the social problems and inequity I’ve witnessed appeal to me on an emotional, human level; and because I believe very much in the truth of mathematically grounded inquiry. In fact, I began studying statistics because I wanted to understand the dynamics of social problems more rigorously. Statistics provided me with the ability to test the validity of my own preconceived notions about the way our world operates and equipped me with the tools to evaluate, for instance, whether assumptions imposed on models used in international development research were realistic and reasonable. Hence, DSSG enabled me to merge my two interests—one in addressing social problems and the other in data-driven analysis and decisionmaking.

On the first day of the fellowship, we were asked to submit short statements about our goals for the summer. I wrote, “I hope to learn firsthand whether or not studying social problems from a data science lens is one I enjoy, value, and can justify pursuing in the future given other modes of approaching these challenges and my own strengths, weaknesses, talents, and interests.” I realize that’s a tall task for 12 weeks, but my fellowship experience thus far has certainly been informative to this end.

Do you recommend fellow statisticians participate in this program in the future? If so, why and what advice do you have for them?

Yes, I wholeheartedly recommend fellow statisticians participate in this fellowship and, more generally, would love to see more statisticians applying their skills to social problems. In both cases, doing so affords the opportunity to help people and organizations striving to do good, honest work in a manner that feels more direct than much traditional, academic research does. The problems we study and solutions we construct have clear, human motivations and implications. One crucial step in this process is investing the time and energy to formulate an interesting, feasible question, and the importance of this step can’t be overestimated.

The DSSG fellows come from diverse fields. How do you view the relationship of statistics to data science?

This question has no clear answer. My own personal take is that statistics is the cornerstone of data science, but that data science is a much larger field. Data science clearly includes parts of applied statistics, computer science, data mining, machine learning, and other, more practical considerations such as data storage and retrieval. Dealing with the intractability of analyzing 1 TB of data, for example, demands database knowledge from computer science, data analysis tools from statistics, and disciplinary expertise about the phenomenon under study, too. In this vein, I think data science encompasses an idea of storytelling through data—what’s the problem, why do we care, how can data inform this question, and what do the data say? Statistics is clearly central to all of these components.

What advice do you have for young statisticians wanting to work in data science?

Ask a question you’re interested in, find (or collect) data to answer it, and get your hands dirty! There’s no better way to learn data science than to do it yourself. There’s data about everything under the sun nowadays, so you can ask virtually any question you wish. That said, don’t limit yourself to trying to answer your questions through number crunching alone, particularly if you’re exploring social problems. Read books and watch movies related to whatever it is you’re studying. Writers and filmmakers also pose questions and explore them; allow their ideas and stories to inform your own process.

What do you plan to do after you earn your degree?

I’ll be working at Dropbox in San Francisco immediately after the fellowship concludes. My future plans are not at all set in stone, but at the moment, I’m juggling a few possibilities in my head: studying machine learning in graduate school, pursuing statistical evolutionary biology and advocating on behalf of conservation efforts, or exploring my interest in the humanities and studying philosophy. I’m also very interested in health, nutrition, and fitness, and I think the data science revolution may offer some exciting insights in these fields.

 

Jonathan Auerbach wpJonathan Auerbach is a statistician/policy wonk hybrid who is finishing his master’s in statistics at Columbia University. During the day, he is an analyst at the finance division of New York City’s legislative body, the New York City Council.

 

How are you spending your summer as a DSSG fellow?

My group is working with the City of Chicago’s Department of Streets and Sanitation to improve their garbage and recycling collection program. We are using the department’s records to evaluate the mayor’s recent centralization of management and identify key metrics to assess productivity.

Most collection in Chicago takes place in alleys, but for a variety of reasons, important variables such as population density or volume of trash are unobservable at this level. A significant portion of our analysis has been developing statistical models to impute these variables.

Incorporating the department’s institutional knowledge into our analysis has been an ongoing challenge. It involves navigating Chicago’s bureaucracy and understanding various political interests to produce results the department can use.

What inspired you to apply?

I began working in public policy three years ago because I was fascinated by how local governments use data and statistics to make decisions. I applied to the DSSG program to augment my data analysis skills and work with others passionate about helping governments make better data-driven decisions.

In both respects, DSSG has wildly exceeded my expectations. I have adopted new methods for manipulating data, version control, visualization, and modeling. The perspectives of the other fellows and mentors who come from a wide range of backgrounds also have been invaluable. These experiences have given me an understanding of the power and nuances of data science.

Do you recommend fellow statisticians participate in this program in the future? If so, why and what advice do you have for them?

Now is a great time to be a policy-data enthusiast, because local governments and nonprofits are just beginning to embrace the wide applications of data analysis. For statisticians who share this passion, DSSG can be an invaluable way to gain direction and meet industry leaders.

Prospective fellows need to understand that data science is still very much the Wild West. Government and nonprofit institutions are not organized in ways that facilitate analysis. They consist of large bureaucracies, where transactional data and records are strewn across various departments in a variety of formats.

Helping governments and nonprofits understand their data requires a lot of patience and imagination. Your consolation, however, is that you will leave the world in a far better place than where you found it.

The DSSG fellows come from diverse fields. How do you view the relationship of statistics to data science?

Statistics is a beautiful discipline studied as much for its aesthetics as its application. In contrast, data science is results driven, and statisticians are desired mostly for their practical skills. Sometimes, this can lead to a problematic relationship between statisticians and other data scientists.

Imagine that the desired result of data science (an evaluation, an imputation, a forecast, etc.) was like remodeling a building. Our clients generally have a pre-existing structure they want to improve upon, and they know exactly what additional features they would find most useful.

The role of the statistician is to determine—given the existing structure (institutional knowledge) and the building materials available (data)—which alterations the building can safely hold. If the work is not done responsibly, the entire building could be condemned. That is, others might view the entire result as invalid, or our clients may be worse off than had they relied solely on their own analysis.

What advice do you have for young statisticians wanting to work in data science?

The best way to become a data scientist is to jump in head first. Almost every major city has an active data science community. Find a topic you are passionate about, go online, and I guarantee you will find a whole community of data scientists working on that topic.

If you are not passionate about a topic, attend an open government meet-up. You may not be passionate about open government now, but just wait until you start paying taxes.
The best part of data science is that no one is an expert in every area, and data science communities are extremely open to teaching and helping you learn new skills. Do not be afraid to ask!

What do you plan to do after you earn your degree?

I plan to pursue a PhD in statistics. In addition, I hope to continue working on policy issues that help local governments use data more effectively—through both my own research and collaborations with data scientists throughout the country.

 

breanna picturewpBreanna Miller graduated from the University of Michigan in 2013 with a master’s degree in statistics. While there, she assisted the university research community by providing statistical advice at the Center for Statistical Consultation and Research.

 

How are you spending your summer as a DSSG fellow?

I am involved in two projects. The first is working with nonprofits that serve students to give them a better idea of which kids they are reaching and the impact they are having on those students’ academic outcomes. The second aims to help bikeshare systems in cities around the United States better balance the number of bikes at stations by predicting when stations are likely to be empty or full.

What inspired you to apply?

The description of the program aligned with my personal and professional goals. I want to use my quantitative skills to solve complex, meaningful problems. The focus on creating solutions that our project partners can implement in the real world was also especially appealing, since the fellowship’s potential for concrete impact gives meaning to the work we do this summer.

Do you recommend fellow statisticians participate in this program in the future? If so, why and what advice do you have for them?

Absolutely! You should participate in this program if you are passionate about using data to make a difference. Future participants should learn as much as possible from the mentors and other fellows. We have diverse backgrounds, and I continue to be amazed by the breadth of knowledge of the collective group.

The DSSG fellows come from diverse fields. How do you view the relationship of statistics to data science?

The massive amount of data that data scientists work with presents a whole new set of challenges that may require different tools than traditional statistics. I think this shift in toolkit, rather than a difference in goals, is what differentiates data science from statistics. Since we are working with more data than any of my course work has required, I am learning to use some of these tools this summer.

What advice do you have for young statisticians wanting to work in data science?

Get comfortable using command line. In a data science context, you will likely need to work with more data than you can store locally, and this familiarity will serve you well. Also, learn to search for help. If your code isn’t cooperating, there is a good chance someone else has had a similar problem, and online tutorials and forums can help you get things working.

What do you plan to do after you earn your degree?

At the end of the summer, I plan to move to Washington, DC. I would like to continue using my skills to have a positive social impact, hopefully as an analyst in health or education policy.

 

walter_dempseywpWalter Dempsey is a PhD student at The University of Chicago, where he graduated with a BS with honors in mathematics, BS in economics, and BA in statistics in 2009. His main methodological research interests are in longitudinal and survival data analysis.

 

How are you spending your summer as a DSSG fellow?

My primary focus is on a project simulating bus services for the CTA. We use GPS and passenger count data to model future demand at every stop in Chicago. These models become a transit planning tool that will allow the CTA to predict how well transit service is likely to perform under a particular schedule change before deployment of a single bus. I’m also working on providing statistical support for Divvy Bikes, a bike-sharing company in the city interested in analyzing weather and bikeshare station trends to predict how many bikes are likely to be available at each Divvy station in the future.

What inspired you to apply?

I found the application through The University of Chicago Statistics Department’s list host. Much of my work in graduate school is focused on theoretical statistics and statistical methodology and I was looking for an opportunity to apply my working knowledge to real-world data sets while becoming more familiar with the computational aspects of statistics. I also was drawn to using quantitative processes for the betterment of humanity.

Do you recommend fellow statisticians participate in this program in the future? If so, why and what advice do you have for them?

I am happy with the program and would recommend the fellowship to any aspiring statistician. By working in teams with fellows of diverse backgrounds, one will gain understanding and appreciation for all components of data science. While statistics provides ample tools for the study and analysis of data, it is difficult for many statisticians to apply their methods in practice to large data sets, as they may lack the necessary computational background. Though not all projects are centered on Big Data, the fellowship provides a setting in which one can master the tools to apply statistical knowledge effectively.

The DSSG fellows come from diverse fields. How do you view the relationship of statistics to data science?

Some in the data science community see statistics (and machine learning) as a set of computational tools for solving a range of optimization problems. In many cases, it is, because data scientists learn statistics through specific applications and do not interact with the statistics community directly. My hope is for a more open dialogue between the different areas within data science. It is difficult, as I think statistics is more conservative in nature than its counterparts within data science. In the end, I hope to convince these people to start viewing statistics from a more methodological standpoint.

What advice do you have for young statisticians wanting to work in data science?

As we push toward an era of Big Data, it will be requisite for young statisticians to become more capable programmers and more knowledgeable about necessary tools to deal with the computational issues that arise in statistical procedures. This push extends beyond industry and is transforming academic departments (consider the Computational and Applied Mathematics Initiative at The University of Chicago). Statisticians who wish to pursue careers in data science must start to branch out and become more familiar with computer science—everything from data management and distributive computing to coding design and web app development. Statistics is one component in the pipeline from the data to an end product and it’s important to at least understand the various pieces.

What do you plan to do after you earn your degree?

While I have enjoyed my time at the fellowship, my current plan is to remain in the academic world and pursue a position as an assistant professor or find a postdoctoral position. I have found it is possible to contribute meaningfully to projects like those found at DSSG from an academic position; my hope is that the connection between academic departments and the data science community strengthens in the coming years so that it will be possible to work within both communities. Both groups could benefit substantially from a continuing and open dialogue, and my hope is to be part of that exchange for years to come.

Share
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.