Home » Member News, Section News, Statistical Computing

Section on Statistical Computing Hosts Second Annual Mini-Symposium

1 April 2024 174 views No Comment

The second annual mini-symposium, “Statistical Computing in Action,” sponsored by the American Statistical Association Section on Statistical Computing, was held online November 4, 2023. One hundred fifteen people registered for the conference in addition to three ‘watch parties’ at the University of California at Los Angeles, University of Connecticut, and Brigham Young University.

ASA Executive Director Ron Wasserstein delivers the opening remarks at the “Statistical Computing in Action” online mini-symposium.

ASA Executive Director Ron Wasserstein delivers the opening remarks at the “Statistical Computing in Action” online mini-symposium.

The nearly five-hour program kicked off with opening remarks from ASA Executive Director Ron Wasserstein and 2023 section chair Mine Çetinkaya-Rundel and included a keynote speech, data jamboree, lighting talks, and panel discussion. All segments of the symposium are available on the section’s YouTube channel.

The symposium commenced with a keynote speech by Simon Urbanek, during which he offered insights into the potential of R in areas overlooked within the statistical community. His talk revolved around the adaptability of R in the realm of large-scale analytics and its role as a versatile analytics service. He also delved into key properties of R and explored its application to large data sets, addressing fundamental considerations such as in-memory versus streaming approaches and parallelization.


Each expert’s demonstration provided a lens through which multifaceted data could be understood, highlighting the strengths and collaborative potential of the languages in addressing real-world data science challenges.

Notably, Urbanek extended the discussion to encompass evidence that R is remarkably efficient, even for moderately sized vectors and matrices, emphasizing the significance of data formats and tools for loading and saving. He also underscored the relevance of R’s proficiency in bulk data processing from diverse sources and its compatibility with Hadoop clusters for handling extensive data. Moreover, he shed light on R’s potential as a service through HTTP REST APIs and web servers, highlighting the ease with which existing models and graphics can be exposed. Finally, he emphasized the scalability of these frameworks, subject to specific-use cases, and celebrated R’s active development.

The data jamboree showcased the analytical prowess inherent in three of the leading open-source programming languages: Julia; Python; and R. HaiYing Wang of the University of Connecticut led the Julia part, Shanon Tass of Brigham Young University led the Python part, and Lucy D’Agostino McGowan led the R part. Moderated by Sam Tyner of LDA Piper (who led the R part last year), the presenters navigated the cleaning, manipulation, and analysis of a subset of the New York City 311 service requests data.

Each expert’s demonstration provided a lens through which multifaceted data could be understood, highlighting the strengths and collaborative potential of the languages in addressing real-world data science challenges. The well-attended Q&A reflected the community’s interest in the comparative effectiveness of these tools in streamlining data workflows.

Carol Willing, a Python and Jupyter core developer and former VP of engineering at Notable, participates in a panel session about open-source software.

Carol Willing, a Python and Jupyter core developer and former VP of engineering at Notable, participates in a panel session about open-source software.

The lightning session illuminated the depth and breadth of statistical computing with nine seven-minute expositions. Bowei Xi of Purdue University exposed vulnerabilities within deep learning and shared robust solutions, David Corliss of Peace-Work addressed AI bias mitigation, and Kris Sankaran of the University of Wisconsin-Madison discussed generative models and their interface with statistics. They all highlighted critical advances in statistical computing and machine learning.

Software development was another focus, with Howard Baek from Fred Hutch Data Science Lab introducing Loqui for streamlined video generation, Jonathan Sidi of Sage Therapeutics elucidating the functionalities of the mmrm package for fitting mixed models, and Arinjita Bhattacharyya of Merck revealing AACTREVEAL for efficient clinical data aggregation. Pedagogical strategies were reimagined by Emily Robinson and Zoe Rehnberg from Cal Poly, with their innovative ‘game plan’ approach to teaching statistical computing eliciting applause.

The session also spotlighted practical analytic advancements, with Shane Sacco from the University of Connecticut offering insights for enhancing prediction pipelines and Yulia Marchenko from StataCorp advocating for the criticality of software reproducibility.

Collectively, these talks illuminated the evolving landscape of statistical computing and its profound impact across various domains.

The mini-symposium concluded with a panel discussion about open-source software, open data, and open computing. Panelists included Tracy Teal, the open source program director at Posit (formerly RStudio); Carol Willing, a Python and Jupyter core developer and former VP of engineering at Notable; and Achim Zeileis, the editor-in-chief of the Journal of Statistical Software and professor at Universität Innsbruck. Among the many topics discussed were the frontiers of statistical computing, the role of open-source software in the era of massively big models, managing a successful open-source project, and how our profession and academia can recognize contributions to statistical software.

The next online mini-symposium will be held in the latter part of 2024.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.