## Calculus and Statistics

*Daniel Kaplan, DeWitt Wallace Professor at Macalester College *

Statistics can be introduced best when students have a solid grounding in calculus. This statement would have been mainstream 50 years ago. Nowadays it is controversial and provocative, even goading. I write it, and believe it, even though there is considerable experience to the contrary.

For at least 20 years, there has been a lively reform movement in statistics education, an improvement in pedagogy based on understanding how students perceive statistical concepts. One reform strategy is to strip away mathematical formalism that’s not strictly needed, including integration and differentiation, the hallmarks of calculus. This has made statistical thinking more accessible.

Danny Kaplan is DeWitt Wallace professor at Macalester College, where he directs the applied math and statistics major. He’s the author of *Statistical Modeling: A Fresh Approach and Start R in Calculus.*

A visible sign of success is the rapid growth of Advanced Placement statistics. There are important and legitimate criticisms of the AP curriculum and how it connects with more advanced statistics, but the success of the AP program is inspiring and a model to be emulated. AP statistics bootstrapped itself into high schools by providing training opportunities for high-school teachers who often had little or no statistics education themselves. Many students find the AP statistics course an attractive alternative to calculus because they see statistics as useful.

A decade ago, the Mathematical Association of America Committee on the Undergraduate Program in Mathematics worked with many partner disciplines to see how the mathematics curriculum can better serve them. The findings, published in the CRAFTY reports, include a recommendation that students broadly be taught statistics without a calculus prerequisite.

The no-calculus form of statistics is also a pragmatic choice; that’s where the students are. Nationally, the most heavily enrolled mathematics course at the college level is “college algebra,” a pre-calculus course designed in almost all cases to lead to calculus, but with a success percentage in the single digits.

Even among those reaching the calculus level, attrition is high. The half-life of a student in the university-level mathematics curriculum is one course. Calculus is a filter that has become a choke-point as the economy becomes more and more technical.

This might make sense if the material learned in the traditional calculus sequence were more directly connected to success in technical careers. But, for many students, the calculus path leads to a destination of uncertain value. A student who spends a year learning techniques for symbolic differentiation and integration of functions of a single variable, along with definitions of limit and techniques for the analysis of sequences and series, by and large learns techniques that will rarely, if ever, be used by instructors in the partner disciplines and even less in their eventual careers. Statistics is often much more relevant to a student’s ongoing and future work.

In thinking about the relationship between calculus and statistics, many people think first about integration and differentiation, cumulatives and densities, areas and slopes. One quickly realizes that calculus doesn’t provide much insight. “Area” and “slope” are intuitive, elementary concepts. Indeed, much calculus pedagogy relies on areas and slopes to motivate derivatives and integrals. Beyond that, the algebraic techniques of calculus courses (e.g., *x* ^{2} → 2*x*) don’t get traction against the common distributions of statistics (e.g., the normal and *t* distributions).

Put aside for a moment the methods for differentiation and integration and think about the tools and language one needs to describe relationships among variables. Calculus and statistics both center on models of relationships: constructing them, analyzing them, evaluating them. In calculus, the choice to add a term to a model reflects some knowledge or hypothesis about mechanism. In statistics, choices are based on evidence provided by data. These are complementary perspectives with a shared foundation in mathematical modeling.

Traditionally, calculus instruction has emphasized functions of one variable, *y* = *f* (*x*). In algebra and pre-calculus, students take on linear forms (*mx* + *b*), then quadratics (*ax*_{2} + *bx* + *c*) and factoring. In calculus, they learn that *mx* + *b* can be used as a local approximation to many forms of functions, while *ax*_{2} + *bx* + *c* gives a better approximation.

For statistical thinking, what’s needed instead of quadratics and factoring is the incorporation of covariates. This can be as simple as the linear function with two inputs, *z* = *a* + *bx* + *cy*. This general-purpose form—extended often to more than two variables but remaining linear—is the workhorse of statistical modeling. It’s a first representation of what might be called complexity that more than one variable can play a role.

A powerful way of thinking about functions like *z* = *a *+ *bx* + *cy* is to ask how the output changes when either of the inputs, *x* or *y*, is changed. An important strategy from calculus is the partial derivative—examining the change in outcome as one input is changed while others are held constant. This aligns with experimental method in science; examining partial change and developing a formal language for describing it helps students understand that there are different ways for change to happen. In my view, understanding what’s a partial change and what’s not is fundamental to thinking about covariates and causation and therefore to the most compelling issues for applying statistics.

Ideally, statistical notions of fitting functions to data are taught hand in hand with the introduction of functions and their parameters in calculus. With this, and with the idea of partial change, students are better able to make mathematical sense of statistical ideas such as adjustment and how the relationship between two quantities, *z* and *x*, is informed by the participation of additional quantities.

At Macalester College, drawing on resources from NSF-sponsored Project MOSAIC, we redesigned our introduction to calculus to incorporate the CRAFTY recommendations: modeling, multiple variables, using only essential algebra, using data to inform models. Although it serves needs of all the disciplines, the primary orientation is toward statistics (even using R software for teaching calculus). In one semester of calculus, students gain experience building and interpreting models in multiple variables. They understand why it’s important to consider relationships among multiple variables and learn the language to express such relationships. Then, when they move on to statistics, they can connect their models to data and examine and evaluate the extent to which the data provide evidence for their models.

The success in making elementary statistics accessible without calculus is remarkable. But it’s not clear how students can move forward along this path to the sorts of statistics needed in contemporary work involving complex, real-world systems. The intellectual skills students need to advance in statistics can be supported by engaging calculus, remodeling the calculus curriculum as needed to support the description, analysis, and judgment needed in statistical work.

## For More Information

Blair, R., E.E. Kirkman, and J.W. Maxwell. 2013. Statistical abstract of undergraduate programs in the mathematical sciences in the United States: Fall 2010 survey. American Mathematical Society.

Garfield, J., and D. Ben-Zvi. 2008. Developing students’ statistical reasoning: Connecting research and teaching practice. Springer.

AP Statistics discussed in

Amstat News, Sept./Oct./Nov. 2009Project MOSAIC, NSF DUE-0920350

Paul Schneemansaid:Thanks for a timely (for me) essay. I have been cleaning out years of notebooks going back to Berkeley in the 60′s, where my physics background provided me with lot’s of exposure to what looked like mathematics, but was really physical argumentation. It was my encounter with data analysis needs through the draft, nutritional studies (alternate service) and the “school of hard knocks” that led me to linear algebra and GLM’s. I look forward to hearing of success in keeping “the Calculus” alive, but I suspect that a name change is needed. Marketing a subject as important for its own sake doesn’t work any more (if it ever did). Trigonometry in high school becomes transparent though CV’s, but that’s not how it’s taught.

AnnMariasaid:Having worked as a statistician for over 20 years, I have to agree that a basic knowledge of Calculus is necessary – emphasis on the basic. I had Calculus in high school, where I paid about as much attention as the average high school student, I suppose, then had two semesters my freshman year of college. I had my first statistics class as a senior in college, and then a few years later went on to specialize in applied statistics in my doctoral program – taking the time in between to work as an engineer. Yes, having some knowledge of derivatives and integrals was helpful. Matrix algebra was far more helpful for the type of work that I usually do. While having a basic understanding gave me a great advantage over my classmates who hadn’t gone beyond college algebra, I have to admit that I wasn’t the best undergraduate – double majored in frat parties and made it to class two days out of three (please don’t tell my children). So … I think if you can get your students as far as mastering the concept of partial derivatives and a smattering of matrix algebra, they’ll be fine.

Flotsam 13: early July links — Quantum Forestsaid:[...] opinion piece on Calculus and statistics by Daniel Kaplan, on teaching a different version of your typical introductory calculus course, so [...]

Posterior samples – SEB113 edition | Sam Cliffordsaid:[...] out that it’s just a combination of calculus, linear algebra and some discrete mathematics. Daniel Kaplan writes at AMSTATNews about ditching mathematical formalism to make statistics more accessible. The American [...]

Calculus and Statistics | Adam Loysaid:[...] the current issue of AMSTAT News, Danny Kaplan, a professor in the Mathematics and Computer Science Department at Macalester [...]

Michaelsaid:Found this because of a recent presentation by Kaplan mentioning the article.

For ~20 years I have been trying to get folks to look seriously at “B-Calc” or Survey Calculus as an option for a much wider range of students. Theoretical calc really is an entry to the Mathematics major while survey calc focuses on the bits of calc that are useful to other majors.

When I look at statistics, those studying outside the mathematics department would be well served by Survey Calc.