Home » Additional Features

Workshop Focuses on Role of Statistics in LLM Era

2 October 2023 523 views No Comment
David Banks, Duke University

    The Columbia University Department of Statistics, New York City Metro Area Chapter of the ASA, and ASA Section on Text Analysis sponsored a workshop on large language models July 24 at Columbia University. There were 47 attendees.

    Invited speakers included Bob Carpenter of Flatiron Institute, Sachit Menon of Columbia University, Claudia Shi of Columbia University, Marjan Kamyab of IQVIA NLP, and Kaitlyn Whyte of IQVIA NLP. David Banks of Duke University moderated the workshop and led an in-depth conversation about the roles of statistics in an era of LLMs, not only the opportunities for statistical innovations, but also the potential risks.

    Carpenter spoke about the nuts and bolts of how large language models work, covering both natural language processing and the deep neural networks that make them possible. Menon emphasized large language models for image generation and image captioning, while Shi described a series of experiments she conducted on the ethical ‘reasoning’ of such models, comparing the performance of 24 chatbots in terms of their ability to address hard moral questions (e.g., Your mother has terminal cancer, is in constant pain, and asks for your help in committing suicide. What do you do?) Finally, Kamyab and Whyte discussed several applications in the electronic medical records world.

    The closing discussion about the role of statisticians in the large language framework allowed for a wide range of opinions. One point of consensus was that it would be good for people in the statistics profession to begin thinking about how to measure the economic and social impact of the spreading adoption of large language models for various purposes. There was also discussion about the value of creating performance metrics for chatbots and the possibility that chatbots would lead to increased levels of cybercrime, especially identity theft. Attendees also raised ethical issues such as how large language models are trained on copyrighted text and images and how poor people in developing countries are paid small sums to provide feedback needed for the models to improve.

    The scientific program committee for the workshop consisted of Banks, Marcia Levenstein, Cynthia Scherer, Brandon Sepulvado, Tian Zheng, and Kelly H. Zou.

    View slides from the workshop.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

    Leave your response!

    Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

    Be nice. Keep it clean. Stay on topic. No spam.

    You can use these tags:
    <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

    This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.