Practical Significance Take Two—Talking Shop: Introducing the ASA Caucus of Industry Representatives

Monthly Membership Magazine of the American Statistical Association

Practical Significance Take Two—Talking Shop: Introducing the ASA Caucus of Industry Representatives

2 October 2023 666 views No Comment

This interview with Ginger Holt, senior staff data scientist at DataBricks, and Amarjot Kaur, executive director at Merck, was conducted by Practical Significance co-hosts Donna LaLonde and Ron Wasserstein during a recent podcast. If you missed the show, this is your opportunity to learn about the recently formed ASA Caucus of Industry Representatives, which will help promote statistics and data science in the private and public sectors and provide resources to successfully advocate for the discipline.

Ron Wasserstein: Tell us about the goals of the ASA Caucus of Industry Representatives.

Ginger Holt: ASA believes members in academia and government are well supported by the organization—there are a lot of structures and venues in place for these, but we wanted something more specific for industry statisticians and data scientists. So, we created the ASA Caucus of Industry Representatives to provide a platform to address the unique challenges for industry statisticians and data scientists.

Some examples include promoting the profession of statistics and data science in the private and public sectors and then assisting companies employing those members in industry. And we want to provide a venue for discussion of unique issues for industrial data scientists and statisticians to facilitate interaction between both the private sector and the public sector. Different verticals within industry as a whole … we’re data scientists, so we like to identify and collect data that are helpful to our employers and to industry in general.

We have a yearly meeting and workshops focusing on specific challenges such as large language models that may require a lot of input from this group in the future.

Ron Wasserstein: Amarjot, you’ve had many leadership roles within the ASA. What excited you about being part of the executive committee for the Caucus of Industry Representatives?

Amarjot Kaur: Being an industry statistician, I can identify with all the goals of this caucus. Also, having worked on various ASA committees and sections and in other leadership roles, I see this as an opportunity for the ASA to further strengthen two-way communication with representatives from a broad group of industries—to hear about issues and barriers for which the ASA might be able to help, provide resources, and further this initiative. So, it’s a win-win situation!

ASA has always strived to help all sections of the membership—whether it’s in industry, academia, or government—and it excites me that now there is a structured approach to move forward the interest of industry statisticians.

Donna LaLonde: One of the things I’ve enjoyed about the meetings with the executive committee and the town hall meeting you hosted was hearing more about what folks view as the challenges facing industry statisticians and data scientists. What do you think are the most significant challenges?

Amarjot Kaur: We are in a time of data revolution. There is so much data, and with that comes both opportunities and challenges. One thing for us to be prepared for is how best to synthesize this large amount of information in a meaningful way, without getting lost in it. You can always do a lot of things with large data, but the question is, “Are we answering the right questions? Do we understand what we are doing?” It’s important to incorporate all relevant information in an analysis but figure out first how best to do it in a meaningful way. We still need to pay attention to detail and be aware of the fundamentals of the analysis when working with a large volume of information.

Another challenge is automation. Automation is great for standard analyses, but when we deal with complicated problems, then I worry about too much automation as to how to interpret the results without completely understanding the intermediate steps. This is something to be aware of, and we need to make sure that we don’t get completely carried away with automation.

Ginger Holt: I definitely agree with Amarjot. I’ve been working in forecasting basically my whole career in several different domains. In forecasting, the fundamental challenges are really the same everywhere; forecasting is pretty ubiquitous. And for context, I was in academia for a few years, but otherwise, I’ve been mostly at very large companies like BP, HP, Walmart, Facebook, and now Databricks. But in all these places, we needed unified forecasts across the company. So, standardized data sets, standardized methodologies, and having one source of truth for all decision-makers—whether that be in finance, sales, or marketing—and to make forecasts consistent across granularities.

When you compare forecasts at various levels, make sure the way they add up is consistent. We want as much accuracy as possible. We want ‘explainability’ for the answers we provide, so we can make actionable decisions.

And then scaling. Amarjot mentioned tooling. We want to scale the forecasting process or any process by building generalizable tooling there.

And Donna mentioned the town hall we had a few weeks ago now. We did collect some data from the participants, and we found we’re targeting professionals as well as students for this caucus. And so, from the data collected during that meeting, we found that professionals really are interested in increasing collaborations across industries. So, I believe having these discussions and sharing knowledge across biomedical, financial, and manufacturing—all these different sectors—would be useful.

And then just knowledge sharing amongst the community of industry data scientists and statisticians. This is a major component. And continuing education of technical skills as we know, just like in the past six months, the amount of growth in LLM capabilities that we’ve seen. There’s just such a need for people to learn more about what’s been happening. And then students, as well. So, students are really interested in learning more about the career paths and journeys of statisticians and data scientists among different industries and crossover—being in one field and then moving over to more of a data science role from a nontechnical role. So, I believe people are very interested in that. The caucus is going to be focusing on these issues and looking for ways to provide additional support to our community.

Ron Wasserstein: You can’t get around discussions of ChatGPT and LLMs. They’re everywhere. What are your thoughts about these models, and what changes do you see on the horizon as a result?

Ginger Holt: Fortunately, at Databricks, we have a lot of customers using our platform. We just published a report on the status of data in 2023, and we can safely say that we are in the golden age of data and AI. So, we believe that AI is really going to usher in the next generation of products and software innovation. We’re already seeing this play out in the market.

This report was based on data from 9,000 of our customers. And so, three main takeaways from the report are companies are adopting machine learning and LLMs at a rapid pace. Natural language processing is really dominating those use cases, with an accelerated focus on LLM specifically.

The second thing is that open source is winning in today’s data and AI markets. So, eight out of 10 of our most widely adopted data and AI products are based on open source. That’s good to know. I see a lot of leaderboards out there ranking different LLM products, and they rank them on accuracy and categorize them based on open source or not open source, and then if they’re allowed for commercial use or not. And so, if you look at those leaderboards, the biggest takeaway is that open source is catching up quickly. Companies are seeing the benefit of keeping their data private, not giving their data to open AI or other LLMs, and training their own models based on their specific use cases, their own data, and their domain.

And the biggest challenge here is the ‘hallucination’ issue. AI is sociopathic, right? It can lie to your face without feeling bad. We need to figure out how to prevent this from happening. People have been actively researching this. Improving the diversity of training data, eliminating inherent biases that may be present, developing better regularization techniques, employing adversarial training, and reinforcement learning—things like that. So, that’s a big problem to solve before we have more reliability, but it’s definitely going to change the way we work and add efficiency to what we do.

Amarjot Kaur: So, there [is value] in using these AI tools, and I can see that in my workplace. For example, the drug discovery phase requires an extensive search for identifying promising molecules to take forward, where AI and machine learning–type tools could be useful in selecting drug candidates in a more precise and efficient way. There are many other applications where one can see the value of AI and machine learning.

On the other hand, AI can have a hallucination effect, as Ginger mentioned, and it could give totally wrong answers. So, you can’t just rely on it completely, without human oversight, even though the answer is based on a lot of information.

One parallel I think of [with] AI is that of a navigation system in our cars. Navigation is an extremely useful tool, and we all have so much dependence on it but, at the same time, we also have to look at the signs on the roads. There can be new signs or roads not recognized by the navigation system and, if you only pay attention to the navigation system, you may get lost.

There are a lot of unknowns [about] AI and, if you hear the news or read about it, some very smart people are really worried about where it is going and what’s going to be its impact in the long run. But again, opportunities are there, as well. I believe that’s how things move forward—there’s always a hesitation in the beginning. Hopefully, there will be boundaries and regulations within which people can make the best out of it.

Ron Wasserstein: What would you say is your best career advice?

Amarjot Kaur: Stay curious! Stay curious and be adaptable to change because life is changing, and it never moves in a straight line. As I said earlier, before jumping into finding solutions, first try to understand the question. What is it that we are trying to answer, and how can we best answer it? The best answer doesn’t always have to be the complicated one. Curiosity and adaptability are important.

It’s also important to take ownership of what you do. Whatever task is assigned, no matter how big or small, we should think beyond what has been asked, think independently, and try to see what more we can learn from that data.

My final words of advice to everyone are to volunteer and stay professionally active within ASA or any other organization of your interest. I have found volunteer activities personally rewarding and a fulfilling part of my career. Volunteerism helps us learn new things and develop and diversify our worldview through broader professional exposure.

Ginger Holt: I echo many of the things that Amarjot said about being adaptable to change. Applying that to your career takes a model, predictive-control–type of approach—like planning a career for the long term and then letting three to six months go by and doing another pass. Make changes based on the new information you have, new interests, new developments in the field, and new knowledge that you may have. So, be intentional about setting a deadline around your career planning.

Have a policy of having research heroes that you follow on Google Scholar and keep up to date with advancements. Maintain your interest in these people. Keep your standards high. Imagine those people looking over your shoulder as you’re doing your work and delivering analysis. What would they think of your work?

Staying active in conferences. Keep learning! That would be my advice.

(No Ratings Yet)

Loading...

Leave your response!

ASA HOME

Departments

ADVERTISERS

MISC. PRODUCTS AND SERVICES
Northeastern University

PROFESSIONAL OPPORTUNITIES
FDA
US Census Bureau

Software
STATA

Contact us

Amstat News
American Statistical Association
732 North Washington Street
Alexandria, VA 22314-1904
(703) 684-1221
www.amstat.org

Address Changes

Amstat News Advertising