Home » A Statistician's Life, Community Analytics

Q&A with Statisticians and Data Scientists Working to Improve Our Communities

1 September 2022 1,007 views No Comment
These statisticians and data scientists have been working on the front lines to improve our communities, so we asked them to talk about their current projects; who inspired them; and how to get started supporting where we each live, work, learn, and play.

Leonor Sierra has more than 10 years of experience in science communication and policy, with a focus on helping scientists get involved in and affect public debates about science. After earning her PhD in physics at the University of Cambridge, she worked for the UK nonprofit Sense About Science, then as a press officer at the University of Rochester. She is currently a freelancer based in Athens, Georgia. She also collaborates as a Sense About Science associate, most recently on a project about how risk know-how helps communities navigate risk information and assess benefits and trade-offs within their own context.

Who or what inspired you to study statistics/data science?

I haven’t studied statistics or data science. My background is in physics, but it’s been my work in science communication and risk with different communities that has made me realize just how much of our understanding relies on being able to question statistics.

Describe a current project and tell us what inspires you in your current work?

I have been working on risk know-how  and am inspired daily by people who take on responsibility for helping their communities understand and engage with different issues.

What is one job skill you learned recently, and why do you find it important?

Recently, I had to take a medical leave. I don’t find it easy to take a step back, but sometimes it’s important to know when to slow down and recharge.

What are three pieces of advice you would offer your younger self?

  • Leaving academia does not mean leaving science.
  • You don’t have to have decided what you will do in the future; just keep learning.
  • Listen and read even more before forming an opinion.

What do you love most about your job?

Talking to people from all over the world, from leading experts to people doing amazing work in their local communities. I learn so much from them and am inspired to do more.

In a President’s Corner, Kathy offered this definition of community analytics: bringing the best of statistical science—in collaboration with municipal governments, universities, local businesses, NGOs, and community organizations—to improve lives through a better understanding of our communities and how we live, work, learn, and play. Looking to the future, what do you see as emerging areas and opportunities?

I think there is a huge opportunity to include community practitioners’ experience and expertise, especially around risk communication—not just as receivers of information but as experts in their community.

What are the most important skills for students to be developing right now in school to be ready for the future in data science?

I think communication is such an important skill for any scientist. I would encourage them to make the most of any opportunity for workshops, trainings, etc. Also, thinking more broadly about public engagement: how to involve anyone who would be end users of the data or part of the group about which data is being collected and how to do so meaningfully and not as an afterthought.

Name one or two favorite blogs or books you have read and would recommend to others.

The Tiger That Isn’t by Michael Blastland was probably the first popular statistics book I read many years ago, and so many of his examples and potential pitfalls have stayed with me.

Superior: The Return of Race Science by Angela Saini is not a book about stats or data science, but I believe it has some important cautionary tales about the data we collect and use—not always judiciously.

Susan Paddock is chief statistician and executive vice president at NORC at the University of Chicago. NORC is an independent research organization that delivers insights and analysis in five principal areas: economics; education; global development; health; and public affairs. NORC statisticians and data scientists work cross-functionally across those areas. Paddock is generally responsible for the methods of design and analysis used in NORC projects and the NORC corporate research and development enterprise. She earned her PhD in statistics from Duke University and her BA in mathematics and biostatistics from the University of Minnesota.

Who or what inspired you to study statistics/data science?

I have always been interested in what happens in our society. In my teens, my interests included politics, human rights, and women’s issues. I brought these interests to college and explored majors I thought might lead me to work on those topics. Part of my exploration included taking courses in health policy and women’s health, which led me toward public policy.

The backdrop to this was the intense health care reform debate of the early 1990s. One class assignment focused on highly experimental and expensive cancer treatments for patients without good treatment choices. There was information to consider about clinical trials and evidence, health economics, and ethics. The importance of using data to understand such a complex scenario became clear to me. An epidemiologist taught the women’s health course. It was my first exposure to identifying gender and racial bias in health research and to scientific practices such as literature reviews and epidemiological study design.

All of this inspired me to seek out professors working in public health, including then-director of undergraduate studies in biostatistics at the University of Minnesota Anne Goldman, who convinced me that majoring in biostatistics and math would prepare me well to bring data to bear on improving health and well-being in our communities.

Describe a current project and tell us what inspires you in your current work?

I am inspired by the challenge of collecting actionable, timely, and high-quality data to guide decision-making and better understand our communities.

My NORC colleagues and I have been developing and applying statistical methods and data science techniques to surveys, administrative records, bibliographic data, environmental sensor data, social media data, etc. Each of these data types has its own strengths and limitations. Recognizing that, we develop approaches to integrate data from multiple sources. This is useful when one data set alone cannot provide valid or precise estimates—such as for obtaining small-area estimates for communities or subpopulations of interest.

At NORC, we conduct surveys that provide data to the public to examine many topics of interest to communities, including COVID-19 vaccination, early child care and education, career trajectories, and voter preferences and attitudes.

One factor that makes all survey data collections challenging is nonresponse. Data science has enabled implementation of approaches to reduce the risk of nonresponse bias through adaptive and responsive sampling designs. This involves using information collected prior to or during data collection to change the design as needed to improve the representativeness of the final sample. This approach can be embedded into a survey data science workflow that, in addition, allows one to ingest, process, analyze, summarize, document, and finalize data sets in a way that is efficient and reproducible.

What is one job skill you learned recently, and why do you find it important?

One of the broad job skills I always aim to improve is communication. In a one-on-one meeting, I work to develop an understanding of my colleague’s perspective and how we might communicate best with one another.

There are people who do not like small talk, and then there are others for whom conversations about hobbies, weather, and weekend activities make them feel more connected to the organization. Some people are systematic and careful in their communication approach, while others make exciting and sweeping pitches.

One of the things I do as a leader is to figure out what it will take to create an environment in which people can succeed. Meeting others where they are when we communicate is part of that.

What are three pieces of advice you would offer your younger self?

First, assume less and ask more questions. If you’re new to the field, your colleagues are hoping and expecting you’ll ask some questions, because we’ve all been ‘new.’

Second, people who are starting out in their careers should pay attention to their full working environment. Not only is it important to expand your skills as a statistician or data scientist, but also appreciating the substantive application area and understanding the mission of your organization will make your career go more smoothly.

Third, get involved in professional activities for career growth, personal development, and networking.

What do you love most about your job?

I love putting data and research findings into the world that people can trust and use to make sound decisions and assessments. Conquering a new statistical problem can be thrilling; there’s nothing like putting together the pieces of a puzzle. As a leader, it is rewarding to create an environment that is dynamic and full of opportunities for others to do impactful and fulfilling work.

In a President’s Corner, Kathy offered this definition of community analytics: bringing the best of statistical science—in collaboration with municipal governments, universities, local businesses, NGOs, and community organizations—to improve lives through a better understanding of our communities and how we live, work, learn, and play. Looking to the future, what do you see as emerging areas and opportunities?

There are many opportunities! I’ll mention just three. First, statisticians and data scientists can empower communities and organizations to make full use of their own data. At NORC, a team developed the Data File Orientation Toolkit, which is an open source toolkit for researchers and analysts—particularly those at state and local agencies—to assess the quality of administrative data files for conducting policy analysis.

Second, we have opportunities to make data easier to use by the public. For example, anyone with access to a web browser can use the General Social Survey Data Explorer to investigate the concerns, experiences, attitudes, and practices of US residents throughout the last 50 years. COVID-19 dashboards empowered many people and communities to monitor data in real time and make personal and policy decisions based on such data.

Third, there is increasing awareness of the importance of stakeholder and community engagement in study design, data collection, and measurement to achieve research and analysis outcomes that are meaningful to communities and promote justice, equity, diversity, and inclusion.

What are the most important skills for students to be developing right now in school to be ready for the future in data science?

I never regretted taking a couple of computer programming courses before I went to graduate school, not only for the obvious benefit of being able to write fast code but also to learn about programming in general. It also helps for students in the job market to have some level of proficiency with one of the major statistical packages and familiarity with others.

Rigorous statistical training remains the compass for statisticians and data scientists to navigate the myriad messy data scenarios out there and—even better—design studies to reduce the messiness as much as possible. Given that messiness, I’d recommend at least some foundational courses related to study design—such as survey sampling, experimental design, or randomized trials—because knowing ‘good’ study design will help one better cope with problematic scenarios in the future.

Machine learning can be applied to numerous problems, so it is useful to know about it. Knowing how to work with data is important, especially being able to identify anomalies and ask the right questions to understand the quality and meaning of the data. This, of course, means students should have opportunities to work with data sets that can be used to answer questions of scientific interest and, ideally, work as part of a team to gain collaboration experience.

Name one or two favorite blogs or books you have read and would recommend to others.

Twitter (in particular, #statstwitter, #econtwitter, or #rstats) is the fastest way for me to keep up with new developments in the field. I find so many interesting reports, papers, blogs, and ‘tweetorials’ that way.

On the leadership side, I was excited that one book on my to-read list last year became part of the book club for the Committee on Women in Statistics: Dare to Lead by Brené Brown. I really enjoyed hearing what other statisticians and data scientists across many career stages thought of the key messages of the book.

There is a lot in our statistical training that sets us up well for leadership; we just have to know where to look to find those lessons. One of Brené Brown’s key messages is that vulnerability is courageous. When we start a new project in statistics or data science, we often start with zero knowledge about the substantive issues. We need to be vulnerable and first admit that before we can make progress. Then, we ask good questions and listen carefully to learn what we must to advance the project.

Juan M. Lavista Ferres is the vice president, chief data scientist, and lab director of the Microsoft AI for Good Lab, where he works with a team of data scientists and researchers in AI, machine learning, and statistical modeling. He joined Microsoft in 2009 to work for the Microsoft Experimentation Platform, where he designed and ran randomized control experiments across Microsoft groups. He also worked as part of the Bing Data Mining team and led a group applying data mining, machine learning, statistical modeling, and online experimentation on a large scale.

Ferres started the Microsoft efforts related to sudden infant death syndrome, and his work has been published in top academic journals, including Pediatrics. Additionally, his work has been covered by more than 100 news outlets around the world.

Who or what inspired you to study statistics/data science?

Two things were key. First, databases. I started to work with data when I was incredibly young and, when I learned SQL, I realized the power of answering questions with data. The second was in my algorithmic class, when I discovered the ID3 algorithm (a decision tree machine learning algorithm invented by Ross Quinlan). I became fascinated by the possibilities and power of data and machine learning.

Describe a current project and tell us what inspires you in your current work?

Working on projects that have an impact and affect the lives of others inspires me every day. We are currently working with the UN to understand the destruction of buildings in Ukraine, which are protected under the Geneva Convention. This is a collaboration with PlanetLabs, and we use deep learning on high-resolution satellite imagery.

What is one job skill you learned recently, and why do you find it important?

I have been learning and using Captum. Captum is a model interpretability and understanding library for PyTorch. Especially in deep learning, understanding the model learnings—particularly in areas like medical imaging—is a must-have. Captum is a great library that provides state-of-the-art algorithms to address this issue.

What are three pieces of advice you would offer your younger self?

Learning statistics is more important than you think. Statistics is one of the fundamental foundations of my job. As part of my computer science undergraduate degree, I took courses in statistics, but I did not pay enough attention to them because I thought they were not needed. This was a big mistake, so I had to relearn a lot of statistics years later.

Learn Python. All programming languages are good enough to work with data and develop software. From a pure programming language perspective, although Python is not the fastest or most efficient, its uniqueness is its vast community of software developers and data scientists who contribute open-source tools that provide tremendous power to data scientists.

Invest in simplicity. As humans, we need to recognize that we are addicted to complexity. We like complex projects and complicated things. This is the wrong addiction. If you want to impress people, your solutions can be complicated, but if you want to have an impact, your solutions need to be simple. Building simple solutions is hard but worth it.

What do you love most about your job?

There are especially important problems out there that can and should be solved with technology and data. I consider myself lucky because I have the opportunity to work on some of them.

In a President’s Corner, Kathy offered this definition of community analytics: bringing the best of statistical science—in collaboration with municipal governments, universities, local businesses, NGOs, and community organizations—to improve lives through a better understanding of our communities and how we live, work, learn, and play. Looking to the future, what do you see as emerging areas and opportunities?

Half the world lacks access to essential health services, and there are not enough doctors to be able to provide services. Approximately 80 percent of the world’s population has access to smartphones, and we expect this number to continue to increase. We predict a significant increase in possibilities to democratize health services by running telehealth, health apps, and algorithms on these devices.

What are the most important skills for students to be developing right now to be ready for the future of data science?

The fundamental skills to learn are coding and statistics. Students as young as those in middle or high school can be proficient in both. As soon as you can learn how to write and read, you can learn to code. You don’t need and should not wait until you’re an undergraduate to learn these skills.

Name one or two favorite blogs or books you have read and would recommend to others.

If you have to read two books, I recommend Lectures on Probability Theory and Mathematical Statistics by Marco Taboga and Deep Learning with Python by Francois Chollet.

Tanya Moore is the founder of Intersecting Lines, LLC, a mission-driven company that uses analytical and community-centered approaches to empower leaders and organizations in the use of data science, statistics, and evaluation methods to support equity-focused initiatives in health, education, and workforce development. She is one of the co-founders of the Infinite Possibilities Conference, a national conference designed to support, promote, and encourage BIPOC women in mathematics and statistics. Moore has been featured in Essence Magazine, Black Enterprise, and O, The Oprah Magazine and was recognized as a “STEM Woman of the Year” by California State Assembly member Nancy Skinner.

Who or what inspired you to study statistics/data science?

What matters most to me is my family, my community, and aligning my actions with the belief that everyone deserves an opportunity to live out their dreams and share their gifts. I never thought mathematics and statistics would be the vehicle for me to create a life that allows me to actualize what I most care about, but it has.

My teachers have served as a huge source of inspiration to me. Spelman College created an environment in which to learn mathematics while in community. At Spelman, my professors believed in me, encouraged me, and challenged me. I was guided to not just think about satisfying requirements for graduation, but to build a life in mathematics that extended past college.

My high-school teacher Mr. Richard Navies had a huge influence on my understanding of the connection between history and our present-day societal challenges. He encouraged all of us to consider a life of service, using whatever skills and talents we developed to make our communities stronger and healthier.

My decision to study statistics started with my interest in analysis and probability and my desire to do work that could improve health. Biostatistics became a way to tie together the things I cared deeply about.

Describe a current project and tell us what inspires you in your current work?

The projects I’m currently working on are energizing and feel personally and professionally meaningful to me. One project that I’m supporting is focused on creating equitable health outcomes in health care systems. Multiple pilots are being launched around the country in different health care settings. What is exciting to me is that the role of evaluation is focused on what can be learned about what supports or hinders health equity, so the type of data and method of collection and analysis will be varied, contextual and organized around answering learning questions.

Another project I’m involved with is using AI to read and evaluate lots of policy documents. I’ve been working with a team to train the AI model in order to minimize racial and ethnic bias in the analysis.

The organizations I work with are bringing a lot of intentionality and consciousness to their work, and I’m humbled by their willingness to dive in and wrestle with some of the most challenging and complicated social issues of our time, such as racism, poverty, or generational trauma. They inspire me with their courage, brilliance, and sense of hope for the future.

I feel grateful that my training in biostatistics allows me to work across different sectors with inspiring leaders who are working toward creating a more inclusive and equitable society.

What is one job skill you learned recently, and why do you find it important?

Recently, I’ve been learning from one of my collaborators how to effectively use Airtable. It allows organizations to take a first step in creating a data system to organize all the data they care about before investing in high-priced software.

Many organizations that provide services or have programs in the community have data they collect—like intake forms, assessments, pre-post surveys, photos, video recordings, or written narratives—and it’s typically in different file formats or housed in different places.

It’s so powerful to see organizations get excited about their data coming together in one database that allows for different views and to have a system that provides greater ease in accessing actionable insights.

Most organizations these days collect a lot of data, but it often goes underutilized because it’s not accessible in a way that helps teams and leaders do the sense-making and use what they learn for decision-making, program improvement, or telling their story of impact.

What are three pieces of advice you would offer your younger self?

  • Stay curious. Explore and learn all that interests and excites you, even if it feels random or disconnected to your primary focus for the moment.
  • Invest time in developing meaningful and authentic relationships; the world is smaller than you think.
  • Be kind to yourself and forgiving of your mistakes. Learning who you are and about life happens inside and outside the classroom.

What do you love most about your job?

What I love is supporting nonprofit and foundation leaders in codifying their vision for positive social change and developing a plan of action that leads to measurable results. Through supporting their research and evaluation goals, I get a front-row seat to transformative change happening all across the country. It’s brought me so much joy and a sense of hope for the future to be part of so many incredible efforts happening around the country to improve our education and health systems or that aim to strengthen and heal communities that have been underserved or marginalized.

It’s also been an incredible experience to create my own company, Intersecting Lines. I get to provide services that integrate and leverage all my professional and educational skills and experiences. The entrepreneurial journey, while at times scary, provides a level of freedom, creativity, and flexibility in how and when I work.

In a President’s Corner, Kathy offered this definition of community analytics: bringing the best of statistical science—in collaboration with municipal governments, universities, local businesses, NGOs, and community organizations—to improve lives through a better understanding of our communities and how we live, work, learn, and play. Looking to the future, what do you see as emerging areas and opportunities?

Improving lives through a better understanding of our communities should include community perspectives and expertise. Too often, when research or evaluation has been conducted on communities—and even when in service to communities—it has had a way of extracting the data desired while leaving community members outside the rest of the process, excluding them from the design, analysis, meaning-making, and dissemination. I think the more we can approach community analytics with a lens of partnership and inclusivity, the more meaningful and useful the data will be.

Policy and systems change to create more equitable institutions is complex work that can take years and many partners working together to realize. I also see organizations wrestling with how to demonstrate change that is not so easily quantifiable. Addressing challenges that have their root causes anchored in racism, poverty, or trauma can feel intractable.

I think an emerging area of community analytics will be how to better integrate quantitative and qualitative data in demonstrating change over time. Just as societal issues are multifaceted, community analytics should be, too.

What are the most important skills for students to be developing right now to be ready for the future in data science?

Certainly, skills in statistics, mathematics, and computer programming are foundational to data science. Not only do they provide the tools to do the work, but studying these subjects helps to build one’s muscle for problem solving and strategic thinking. Beyond that, I would encourage students to explore courses and topics outside their discipline. The more you can develop a process for learning about other topics and viewing yourself as a translator of sorts, the more effective you’ll be as a data scientist.

Name one or two favorite blogs or books you have read and would recommend to others.

One exciting book I’ve recently discovered is W.E.B. Du Bois’s Data Portraits: Visualizing Black America, edited by Whitney Battle-Baptise and Britt Rusert. The book showcases 60 data visualizations created by W.E.B. DuBois that were displayed in the 1900 Paris Exposition. The charts and graphs were presentations of publicly available data and collectively communicated a story of African American advances in society post-slavery. The images are a powerful example of how data can be used to not only present facts, but to shine a light on those issues in society that need our attention, awareness, and action.

I also love Stephanie Evergreen’s blog . She is a data viz entrepreneur whose work I’ve followed for a few years now. I admire her expertise; sense of humor; and commitment to making data practical, accessible, useful, and meaningful.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.