Home » A Statistician's Life, Featured

Why Statistics? Why BLOG?

1 September 2010 12,089 views 2 Comments
Short for weblog, the Merriam-Webster Dictionary describes a blog as a website that contains an online personal journal with reflections, comments, and hyperlinks provided by the writer. Recently, there has been an outcropping of blogs about statistics, so Amstat News asked a few statistical bloggers how they got started in both statistics and blogging.


Name: Nathan Yau

Current affiliation: University of California, Los Angeles, Department of Statistics

Degree(s), and from where: Currently a PhD candidate at UCLA; EECS BS from UC, Berkeley

Website: http://flowingdata.com

When was FlowingData started?: June 25, 2007

Why did you start FlowingData?
I moved to Buffalo, New York, with my wife two years into my PhD. FlowingData was a way for me to kind of keep in touch with people who were also interested in statistics and visualization, because I’ve had to do the bulk of my studies remotely. Then, about a week into it, I found it was a great way for me to document my experiences as an intern and at workshops.

Has creating FlowingData advanced your career? If so, in what way(s)?
Definitely. Because my focus is in visualization, it’s been a great way to get my work out there. If the work is good, it tends to spread, and, from there, people will at least know about what you’re capable of. At this point, I’m still focused on finishing my PhD, but I’ve been able to freelance on the side, which is a good break from dissertation work every now and then. It’s also been a great way to make contacts, simply because people follow what I do.

What skills did you have to learn to produce FlowingData?
I had already been into building websites before FlowingData, so I didn’t have to learn much. I use WordPress, which is an open-source blogging platform. It’s practically a one-click setup. Probably, most of my time in the setup was designing the site using something called themes. That is just my preference, though. There are a lot of free themes that you can download. I just like to customize when I can.

What is most interesting or challenging about maintaining FlowingData?
I can get so involved with maintaining the site that it distracts me from dissertation work, so finding that balance is tough sometimes. Especially in the beginning, I was always checking views and subscriber counts. However, now that the readership has grown to about 37,000, it’s always fun to read comments and interact. I’ve found that a lot of the time, you can learn quite a bit from your readers, so it’s not just you spouting off on your soapbox.

Where do you find the material you cover?
I subscribe to a lot of sites and blogs. I also use Twitter. In the beginning, I used to find everything myself through these sources, but readers also send in a lot of suggestions nowadays.

What, specifically, do you wish you had known when you started FlowingData?
Don’t be afraid to put your work out there. If people like it, then great. If they don’t, then you’ll learn something. Either way, you benefit.

Whom do you rely on for technical support?
I do everything for the most part. If something is wrong with the server, I’ll contact my web host and they’ll usually take care of any technical issues right away.

What do you recommend students and new statisticians focus on outside of their coursework to advance in their careers?
Branch out into other fields. All of my best work experiences have come out of collaborations with people not in statistics. In all likelihood, you’re going to be working with nonstatisticians in the end, and being able to communicate your results, analyses, and ideas to others is extremely important. Plus, when you work with people outside your comfort zone, you end up learning way more than if you were to stay inside your circle. From what I’ve seen, employers really appreciate that well-roundedness.

What is the one website you can’t go a day without visiting?
Information Aesthetics. It’s the first visualization-related blog I ever read.

Name: Andrew Gelman

Website: www.stat.columbia.edu/~gelman/blog

I studied math and physics at MIT. To be more precise, I started in math by default. Ever since I was two years old, I’ve thought of myself as a mathematician, and I always did well in math class, so it seemed like a natural fit.

But, I was concerned. In high school, I’d been in the U.S. Mathematical Olympiad training program and met kids who were clearly much better at math than I was. In retrospect, I don’t think I was as bad as I’d thought at the time. There were 24 kids in the program, and I was probably around #20, if that, but I think many of the other kids had more practice working on Mathematical Olympiad–type problems. Maybe I was really something like the 10th best in the group.

Tenth-best or 20th best, I reached a crisis of confidence around my sophomore or junior year in college. At MIT, I started right off taking advanced math classes, and, somewhere along the way, realized I wasn’t seeing the big picture. I was able to do the homework problems and do fine on the exams, but something was missing. Ultimately, I decided the problem was that, in the world of theoretical math, there were the Cauchys, the Riemanns, etc., and then there was everybody else. I didn’t want to be one of the “everybody else.” Unfortunately, I didn’t know about applied math at the time. At MIT, as elsewhere I imagine, the best math students did the theory track.

I was also majoring in physics, which struck me as much more important than math, but of which I felt I had even less of an understanding. I did well in my classes and reached the stage of applying to physics graduate schools. (It was MIT. I didn’t have many friends and I didn’t go on dates, so that gave me lots of time to do my problem sets each week.) In fact, it was only at the very last second in April of my senior year that I decided to go for a PhD in statistics, rather than physics.

I had some good experiences in physics, most notably taking the famous Introduction to Design course at MIT. Actually, that was a required course in the mechanical engineering department, but many physics students took it, too. I also worked for two summers doing solid-state physics research at Bell Labs. We were working on zone-melt recrystallization of silicon and, just as a byproduct of our research, discovered a new result (or, at least it was new to us): that solid silicon could superheat to something like 20 degrees above its melting point before actually melting. This wouldn’t normally happen, but we had a set-up in which the silicon wafer was heated in such a way that the center got hotter than the edges, and, at the center, there were no defects in the crystal pattern for the melting process to easily start. So, it had to get really hot for it to start to melt.

Figuring this out wasn’t so easy—it’s not like we had a thermometer in the inside of our wafer. (If we did, the crystalline structure wouldn’t have been pure and there wouldn’t have been any superheating.) We knew the positions and energies of our heat sources, and we had radiation thermometers to measure the exterior temperature from various positions. We knew the geometry of the silicon wafer (which was encased in silicon dioxide), and we could observe the width of the molten zone.

So what did we do? What did I do, actually? I set up a finite-element model on the computer and played around with its parameters until I matched the observations, then looked inside to see what our model said the temperature was at the hottest part of the wafer. Statistical inference, really, although I didn’t know it at the time.

When I came to Bell Labs for my second summer, I told my boss I’d decided to go to graduate school in statistics. He was disappointed and said that was beneath me, that statistics was a step down from physics. I think he was right (about statistics being simpler than physics), but I really wasn’t a natural physicist, and I think statistics was the right field for me.

Why did I study statistics? I’ve been trained not to answer why questions, but rather to focus on potential interventions. The intervention that happened to me was that I took a data analysis course from Don Rubin when I was a senior in college. MIT had few statistics classes. I’d taken one of them and liked it, and when I went to a math professor to ask what to take next, he suggested I go over to Harvard and see what they had to offer.

I sat in on two classes. One was deadly dull and the other was Rubin’s, which was exciting from Day 1. The course just sparkled with open problems, and the quality of the 10 or so students in the class was amazing. I remember spending many hours laboriously working out every homework problem using the Neyman-Pearson theory we’d been taught in my theoretical statistics course. It’s only by really taking this stuff seriously that I realized how hopeless it all is. When, two years later, I took a class on Bayesian statistics from John Carlin, I was certainly ready to move to a model-based philosophy.

Anyway, to answer why I studied statistics, I’ll say because Rubin’s course was great. I was worried that statistics was just too easy to be interesting, but Rubin assured me that, no, the field has many open problems and I’d be free to work on them. Indeed, I have.

Statistical Modeling, Causal Inference, and Social Science

Why did you start the blog Statistical Modeling, Causal Inference, and Social Science?
Why did I start a blog, considering that I started my PhD studies in 1986 and didn’t start blogging until nearly two decades later? I started my casual Internet reading with Slate and Salon and, at some point, followed some links and read some blogs. In late 2004, my students, postdocs, and I decided to set up a blog and wiki to improve communication in our group and to reach out to others. The idea was that we would pass documents around on the wiki and post our thoughts about each other’s ideas on the blog.

Where do you find the material you cover?
I figured we’d never run out of material because, if we ever needed to, I could always post links and abstracts of my old papers. (I expect I’m far from unique among researchers in having a fondness for many of my long-forgotten publications.)

What happened?
For one thing, after a couple months, the blog and wiki were hacked (apparently by some foreign student with no connection to statistics who had some time on his hands). Our system manager told us the wiki wasn’t safe, so we abandoned it and switched account names for the blog. Meanwhile, I’d been doing most of the blog posting. For a while, I’d assign my students and postdocs to post while I was on vacation, but then I heard they were spending hours and hours on each entry, so I decided to make it optional, which means that most of my co-bloggers rarely post on the blog. Which is too bad, but I guess understandable.

What is most interesting or challenging about maintaining this blog?
Probably the #1 thing I get from posting on the blog is an opportunity to set down my ideas in a semi-permanent form. Ideas in my head aren’t as good as the same ideas on paper (or on the screen). To put it another way, the process of writing forces me to make hard choices and clarify my thoughts. The weakness of my blogging is that it’s all in words, not in symbols, so, quite possibly, the time I spend blogging distracts me from thinking more deeply about mathematical and computational issues.

Has creating this blog advanced your career? If so, in what way(s)?
At times, blogging has motivated me to do some data analyses that have motivated me to do new statistical research.

There’s a lot more I could tell you about my blogging experiences, but, really, it all fits in a continuum with the writing of books and articles, meetings with colleagues, and all stages of teaching (from preparation of materials to meetings with students).

One thing that blogging has in common with book and article writing is that I don’t really know who my audience is. I can tell you, though, that the different blogs have much different sets of readers. My main blog has an excellent group of commenters, who often point out things of which I’d been unaware. At the other blogs where I post, the commenters often don’t understand where I’m coming from and all I can really do is get my ideas out there and let people use them how they may. In that way, it’s similar to the frustrating experience of writing for journals and realizing that sometimes I just can’t get my message across. On my own blog, I can go back and continue modifying my ideas in the light of audience feedback.

My model is George Orwell, who wrote about the same (but not identical) topics over and over, trying to get things just right. (I know that citing Orwell is a notorious sign of grandiosity in an author, but, in my defense, all I’m saying is that Orwell is my model, not that I have a hope of reaching that level.)

Name: Kaiser Fung

Current affiliation: Sirius XM Radio and New York University

Degree(s), and from where: MPhil (Cambridge), MBA (Harvard), BSE (Princeton)

Website: http://junkcharts.typepad.com

When was Junk Charts started? July 2005

Why did you start Junk Charts?
Initially, to create a reason to write regularly. I was pleasantly surprised to find a community devoted to good graphical presentation; the readers are what keep me going. It’s a lot of work, though.

Has creating Junk Charts advanced your career? If so, in what way(s)?
It certainly helped convince McGraw-Hill to pick up Numbers Rule Your World, and I’m sure readers recognize my way of thinking, so if they are looking for people with my skill set, they can approach me, although I have not pushed the website in this way.

What skills did you have to learn to produce Junk Charts?
How to build and nurture a community of readers

What is most interesting or challenging about maintaining Junk Charts?
Biggest challenge is how to maintain the regular flow of posts. Because most of my posts are substantive analyses, they take hours—if not days—to write. My blog is competing with hundreds of other blogs for the attention of readers, and many bloggers put up several posts per day.

Where do you find the material you cover?
With my topic, I can pick up material from my everyday reading. I must thank my readers profusely for sending me materials; their submissions really make my blogging life a lot easier!

What, specifically, do you wish you had known when you started Junk Charts?
It has all been a positive experience for me.

Whom do you rely on for technical support?
I troubleshoot myself, and, for knotty questions, I go to tech-savvy friends. The blogging software is set up for anyone to be able to start a blog.

What do you recommend students and new statisticians focus on outside of their coursework to advance in their careers?
Communications skills, how to make an argument to a nontechnical audience, how to use graphics effectively to bring home your message

What is the one website you can’t go a day without visiting?
There are many, but Andrew Gelman’s blog is a must-read for statisticians.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

2 Comments »

  • morris olitsky said:

    Dr. Gelman,
    What’s most interesting to me is your description about your background and how you got into the field of statistics; it mirrors mine, except that I dropped out of graduate school before getting a Ph.D.
    I am honored to have a common experience with someone as distinguished as you.