Member Spotlight: Bo Zhang
Bo Zhang is a senior data scientist at IBM. His mission is to design and implement statistical models and machine learning algorithms on Big Data, write codes to deploy these models to create real-time actionable insights, and present where and how these insights can help business users drive and grow businesses.
What or who inspired you to be a statistician/data scientist?
As I entered high school, I tried out for my school soccer team, but did not earn a spot for the first season. Instead, the coach asked me to collect data during the games (e.g., goals, saves, misses, assists, etc.) and calculate simple summary statistics. Then, I started to realize I was passionate about data and statistics. Such passion continued to motivate me during my bachelor’s and master’s degree studies in mathematics. Not content with the mere accumulation of theoretical knowledge, I proactively pursued practical opportunities (e.g., the Mathematical Contest in Modeling) to solve real-world problems using data and mathematics. But that still wasn’t enough. I wanted to gain more advanced knowledge and learn analytical methods to make sense of real-world data every day. This led me to spend 3.5 more years pursuing master’s and PhD degrees in statistics at North Carolina State University.
Do you prefer statistician, data scientist, or either?
I do prefer to call myself a data scientist today because my work is not only designing and implementing statistical models (e.g., regression, classification, clustering, association, recommender systems) as statisticians do, but also writing codes to deploy these models to create real-time actionable insights and presenting my findings on where and how these insights can help the business users to drive and grow businesses. From concept to realization, data scientist is the role where you can go from A to Z, or at least D to S (from Data to Science).
You married a statistician. How did you meet?
It was the fall of 2009 and I was entering my first year as a PhD student in statistics at North Carolina State University and working as a teaching assistant for Introduction to Probability and Distribution Theory. I was one of the two TAs for this course. The other one was my PhD classmate and now my wife—Liwei Wang. She was beautiful and smart. As we were assisting the same course, we quickly became good friends and discussed statistics problems together. The topics of our discussions then ranged much beyond statistics. The rest is history. Now she is a statistician in the pharmaceutical industry leading statistics and programming teams on clinical trials. I’m so glad I got assigned to that particular course.
Name a few specific skills you need to do your job.
In addition to possessing solid quantitative skills in mathematics and statistics, knowledge of SQL and Hadoop to access data, and basic programming abilities in R or Python, data scientists also need to be able to understand the business and requirements from business users and translate quantified insights to cross-functional teams on development, design, marketing, and sales to deliver the actionable recommendations. Besides these technical hard skills, the soft skills like curiosity, creativity, grit, and humility can help data scientists overcome barriers to dig out the golden nuggets from mountains of Big Data.
What is the most exciting part of your job?
Mathematics and statistics are only meaningful and fun when they are solving real-world problems. The opportunity to leverage them along with Big Data to help drive and grow businesses by creating actionable insights has always been the most exciting part of my job.
Have you ever had a mentor? If so, what role did mentoring play in your career?
Yes. When I was a PhD student at NCSU, my doctoral dissertation co-advisers—Hua Zhou (now at UCLA) and Lexin Li (now at UC Berkeley)—helped me build my self-confidence and create plans that moved me forward. They provided me opportunities such as the SAMSI graduate fellowship and industry conference travel fund. They maintained high standards to excellence and gave direct, constructive, and honest feedback.
I was also fortunate to meet with Sujit Ghosh (who was then one of the directors of graduate program in statistics at NCSU), whom I consider my career mentor. He teaches by sharing his experiences, challenges, and successes. When I had the opportunities to intern with local companies during my PhD studies, he helped me decide among offers of graduate industrial traineeships and explore my future career options. I still regularly meet with him to discuss the way to bridge the gap between the statistics in academia and the data science in industry.
Have you ever mentored anyone? If so, what did you learn?
As a lead data scientist, I proactively coached junior data scientists and developers interested in data science. I helped them in their data preparation skills and statistical model design skills. Actually, there is value for both them and me. While assisting them to achieve their goals, I enhanced leadership skills and learned new perspectives (e.g., developers’ perspectives).
What advice would you give to young statisticians just beginning their careers?
Get your hands on real-world data (e.g., Kaggle projects). Spend and invest more time in programming (e.g., R, Python, etc.). Practice presenting your statistical findings to nonstatistical audiences. Keep up to date with the application of statistics in your domain. And, of course, take advantage of mentoring resources as much as possible.
Why did you join the ASA, and why have you stayed a member?
When I first joined the ASA as a PhD student in statistics at North Carolina State University, I took advantage of attending conferences like JSM to present my research and learn from others. Even though I work in industry, I still get connected to the network of statistical professors, statisticians, and data scientists through the ASA. In fact, I’m scheduled to present an accepted paper at the DSAA (Data Science and Advanced Analytics) conference, which the ASA cosponsors with IEEE and ACM.
What do you enjoy doing in your spare time?
In my spare time, I enjoy playing soccer and writing codes to analyze data (not just soccer data) to amuse myself.
Name one or two favorite blogs or books you have read and would recommend to others.
One of my favorite books is The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. It helps statisticians not only understand the machine learning algorithms from the computer scientists’ perspective, but also learn the mathematical rigor behind these machine learning algorithms. I consider this book to be the go-to bible for data scientists.