Home » Additional Features

So, How Was Your Day?

1 September 2014 724 views One Comment

“You’re a statistician. What exactly is it that you DO all day?”

Many statisticians are asked that question by those outside the profession. To answer, Amstat News asked three ASA members to give us a glimpse into their work day.

Kaiser Fung

Kaiser_FungKaiser Fung provides training and advisory services in business analytics and data visualization. He is senior data advisor at Vimeo. Previously, he held leadership roles in building and managing data teams for businesses in the entertainment, digital advertising, and financial services industries. He is the creator of the popular Junk Charts blog, which pioneered the genre of critically examining graphics in the media, and author of Numbers Rule Your World and Numbersense—both published by McGraw-Hill. He has an MBA from Harvard Business School and holds engineering and statistics degrees from Princeton and Cambridge. He is also an adjunct professor at New York University.

8:45 a.m. Jolted awake this morning less by the coffee I get at Chelsea Market than the new price of $2.50 a cup. “I thought you already knew,” the server said apologetically. What I know is that I was paying $1.75 only a few months ago. That’s a jolt of more than 40%. Incidentally, the coffee stand is situated next to an elevator labeled YOUTUBE, the 1,000-pound gorilla in the online video market that Vimeo, my employer, competes in.

FUNG_iacbldg2

9:05 a.m. Arrive at my office in the glorious IAC Building, surely one of the landmark façades of modern New York City. Chances are you have seen pictures of it.

9:10 a.m. The first order of business is to review the roster of meetings I’m supposed to attend today. For the next hour, I’ll be reading and replying to emails. My inbox teems with automated notifications about our overnight data-processing programs. Our team struggles to keep up with these emails; there is never enough time to review them. In the end, most data issues are caught by encountering bad output. And yet the solution to every misstep is to set up another notification!

FUNG_boxhunterhunter2

10:01 a.m. Scrambling to get through all the emails before the majority of the staff arrive, the lights go out and the business of movies begins. Yes, the engineering section of the floor goes dark at a set time each morning, thanks to some high-tech wizardry. I’m told this raises productivity, and I have yet to test the theory with an experiment. I reply to an email from our summer intern, who asks about preparatory reading. As a major task of the internship will be statistical testing, I give my usual recommendations: the front chapters of Box, Hunter, and Hunter and the chapter on randomization in Ian Ayres’s SuperCrunchers. Most importantly, though, I advise him to enjoy the time off. I hope he isn’t confused by the mixed message!

10:15 a.m. The manager of analytics informs the team that overnight data processing has failed due to late-arriving data from Google’s display ad reporting system. Everyone reacts calmly, as this happens frequently. We wait and will restart the process when the data show up. I make a mental note: If the delay eats up our buffer, I will proactively notify the executive team that the daily report on key metrics will be tardy.

11:00 a.m. The director of testing convenes a meeting to preview the plan for the next few months. In a couple of weeks, she’ll start maternity leave. I will temporarily manage the testing initiatives with the help of the summer intern.

11:15 a.m. The analytics manager confirms that the daily process has succeeded and key metrics will be available on time.

12:45 p.m. Lunch with a co-worker. I mention I am writing an article about recommendation engines. (The article has since been published.)

1:30 p.m. For the next two hours, I am able to concentrate on an analytics project, evaluating a new source of data. I am fiddling with some sample files from a third-party vendor. These files have the tall format of the modern-day convention. There are three (nested) columns of “keys” and a column of “values.” This format is simple to ingest, but tough to digest, because individuals generate a varying number of rows of data. I have been working on transposing the data to a wide format so there is one row per individual. I finish that task and move to producing summaries of the data.

2:15 p.m. I hit a snag. All the data should have been text, but a level of 118376 rears up. I size the extent of the issue and then verify that these errant values come directly from the raw files, not from an accident of my processing.

2:35 p.m. I email the account manager at the vendor, describing the problem. At this time, I have a hunch that 118376 maps to one of the missing levels, its numerical equivalent. Even though this vendor responds quickly, it will be a few hours before the problem is resolved. I am at an impasse; after the sample files are corrected, I will have to execute the transpose and other steps anew.

2:40 p.m. I turn to another branch of this project. The new data give us an aggregated view of the psychographics of our customers, but it is impossible to judge the data’s value in a vacuum without a reference point. A few days ago, I asked the vendor to provide comparable data for the general Internet population.

3:30 p.m. The account manager acknowledges the 118376 issue and confirms that their engineers are correcting the data.

3:40 p.m. An hour later, I am seeing some rays of light. Yes, what sounds like a simple case of merging two data sets has vaporized 30 minutes of my day. The Internet data are found in an Excel spreadsheet. In the initial merge, I discover that labeling is inconsistent across the two data sources: A few labels contain extra words, which I remove, and an extra whitespace at the end of another label trips up the computer, which I fix. Those prove to be minor obstacles after I realize the taxonomy of the categories does not match perfectly: The Internet data reveal one extra, as well as one fewer, level than the categories extant in the sample files. I fire off another email to the vendor and, at this point, must shelve the project until I get a response.

3:50 p.m. My lunch buddy pings me on Instant Messenger promising something good. I make my way to her desk. Regarding our earlier conversation, she wants to share that Facebook has been sending her suggestions for people to friend, only the first names on the list are her ex-boyfriends! Oops. I guess Facebook has found a good algorithm to predict one’s exes.

4:00 p.m. I meet with one of our analytics managers. We have been investigating a long-term trend in the traffic data. Given an observed outcome, we are looking for plausible causes. It’s the idiomatic fishing expedition, but a common business problem!

5:05 p.m. The analyst and I present a progress report to the management team on the traffic investigation. We describe which hypotheses have been examined and strike off ones that fail to convince. New ideas emerge and we set up another meeting.

5:46 p.m. The vendor has a diagnosis. The errant rows of data are generated when their computer gets confused by unexpected commas inside the text field. New sample files will be available the next morning.

6:55 p.m. I am home. I make some pasta with a Bolognese ragout that was slow-cooked last week.

8:00 p.m. The second part of my day begins. I am teaching a course in business analytics at New York University the next day. I finalize the slides and upload them to the course website. The class will begin with teams of students presenting results of a cluster analysis from the prior week. I will then lecture on using decision trees for prediction.

FUNG_blog_pair_titled

9:30 p.m. There’s still time to make a blog post. Blogging is the daily grind—it’s a delicate plant that requires watering daily, and if you forget for a few days, you may have squeezed a few drops of life out of it. I open the draft of a piece about Josh Katz’s marvelous maps of dialects across the United States. When I click on the button to release the completed blog post, my work day finally comes to an end. I make my evening coffee and relax.

FUNG_coffee

Lee Richardson

LeeRichardsonLee Richardson recently graduated from the University of Washington with a BS in mathematics and statistics. He has been working for the Institute of Health Metrics and Evaluation on the Global Burden of Disease study, in which the goal is to synthesize the world’s health data into comparable estimates. He will join the department of statistics at Carnegie Mellon as a PhD student this fall.

7:55 a.m. Alarm goes off, leading to my first decision of the day. Snooze button?

8:25 a.m. Three snooze’s later and I’m finally up and moving. Thankfully, my job isn’t too rigid about when people arrive.

8:30 a.m. My general breakfast strategy is to throw a bunch of presumably healthy ingredients into a blender. Today: One egg, honey yogurt, strawberries, and a banana.

8:45 a.m. Shower, put on some clothes, and head to the bus stop. Luckily, my bus stop is a block away so my commute isn’t very stressful.

Richardson_coffee

9:15 a.m. First stop is our kitchen to grab water and coffee. Here’s a look at our enormous coffee machine.

9:20 a.m. Coffee and water acquired, it’s time to head to my desk to see what’s on tap for the day. I keep a handwritten to-do list next to my desk at all times and never move it. This helps me both organize my tasks and keep my work duties separate from the rest of my life. Following is a list of my major responsibilities at work. This contains useful details as to what someone with a bachelor’s degree in statistics can expect from a first-year job:

1. Managing our research databases
2. Writing custom programs to assist researchers model their diseases
3. Extracting, transforming, and uploading new sources of data into our research databases
4. Responding to ad-hoc tasks such as pulling specific data, creating tables and figures, etc. …

Richardson_desk

After checking Grantland for any new Zach Lowe articles, my first task is to re-run our hemoglobin custom models. We’re using the Hardy Weinberg equation to estimate years lived with disability (YLDs) due to hemoglobin deficiency. Thankfully, I’ve done this before and I’m pretty confident my script will work again. Sometimes it’s difficult to get onto our computing cluster because lots of other people are using it, but it’s early enough to get my jobs through without any worries.

10:30 a.m. Hemoglobin has run without any issues! The results look reasonable, so I email the modeler to further vet the results.

Onto the next task (there’s no shortage). Scanning through my email and to-do list leads me to several changes I need to make to our epidemiology database. I’m going to get into “database mode” and try to crank them all out before lunch.

11:00 a.m. More coffee.

12:00 p.m. Database mode was a success. I’m off to lunch with my friends Bryan, Chris, and Logan. We usually go to Whole Foods (along with everyone who works at Amazon) due to the vast array of options. I predictably order a burrito bowl, pay, and head back to our kitchen.

Richardson_lunch

12:15 p.m. The World Cup has just started, so a ton of people are watching the opening games. The lunchroom is no exception.

12:50 p.m. Back to my ball and chain (computer) for the second half of the workday. So far, I haven’t done anything too draining, so I have a good amount of mental energy left in the tank to try something a bit more challenging. I’m going to try to crank out a couple new functions that have been lingering on my to-do list. The two tasks are:

1. Write a function that compares GBD 2010-YLD results with our upcoming GBD 2013 estimates
2. Write a function that produces a table/scatterplot of GBD 2013 cause of death results for any cause specified

I’m cautiously optimistic that I can finish both by the end of the day. I usually listen to a meditation playlist on Spotify when I need to focus on completing tasks such as these.

Richardson_coworkers

2:00 p.m. Brief interlude to talk to the people who sit near me about the misleading web histories that come from working at the IHME. Maggie (shown at left) is working on literature extraction for STDs, so you can only imagine what her Internet history looks like.

2:15 p.m. I have a working version of the deaths function now and am going to test it on different causes to make sure it’s working as advertised. Once it’s fully tested, I’m going to document how to use it and pass it off to the requester. On to the Epi results comparison.

3:00 p.m. Institutional knowledge is critical to being successful in the workplace. Having spent a lot of time in our infrastructure this past year, it’s much easier to hunt things down. However, since GBD 2010 was released before I arrived, I’m finding it difficult to link the results for corresponding causes in 2010 and 2013.

4:00 p.m. Debating whether I should go get a snack or wait until I get home to have dinner. Pretty classic dilemma I have faced around this time all year.

5:00 p.m. Ended up making some progress, but didn’t fully figure it out. I wonder if I would have had I bought a bag of almonds an hour ago? Oh well, I’ll probably think about how to finish it off tonight and crank it out first thing in the morning.

5:25 p.m. Heading home. This is probably long enough, so I will end the timeline of my day here. Hope everyone reading this has a better idea of what a first-year statistics job looks like!

Ann Oberg

AnnObergAnn Oberg is a professor of biostatistics at the Mayo Clinic. She earned a BS in mathematics with a minor in statistics from the University of Nebraska-Kearney, an MS in biometry from the University of Nebraska-Lincoln, and a PhD in statistics from North Carolina State University. She has been at Mayo, where she collaborates in both cancer and vaccine research, since 1999. Her expertise is in study design, normalization, and analysis of data from high-dimensional platforms. She and her husband live in Minnesota with their three children.

Oberg_1

6:00 a.m. My alarm goes off … hit snooze once … up at 6:09 a.m. Spring has finally arrived in Minnesota! It’s a beautiful morning; the rooster is crowing and the meadowlarks and mourning doves are singing. We’ve even had a bob white around the place lately! I pour a cup of coffee (auto-brew is awesome) and hop in the shower.

6:30 a.m. Nels, my husband, is up and sharing my coffee. ☺ I go downstairs to make sure EmmaAnn, our 7th-grade daughter, is up. She is—this is her favorite day of the week because she has jazz before school. (Though Nels says it’s her favorite just because the teacher brings them donuts.)

6:47 a.m. We sit down for strawberries, blueberries, and yogurt with a few Cheerios on top for breakfast.

6:54 a.m. Finish getting ready to leave and quickly wake our two boys, Sven and Sorren, so I can see them before I leave. Nels is a stay-at-home Dad. He and Sorren will take our first-grader, Sven, to school. All three wave out the window at us as we leave.

7:03 a.m. EmmaAnn and I leave for school … we are running a bit late. We chat about her track season being over and her excitement for the 7th/8th-grade dance the next evening.

7:11 a.m. Drop EmmaAnn at school and head to work. I do some memorization while in the car. I am a statistician at the Mayo Clinic in Rochester, Minnesota. One thing I like about working here is that we can live in the country, yet my commute is generally only 30 minutes door-to-door (unless I’m also dropping kids at school), and the rare ‘traffic jam’ slows me down by ~5 minutes.

7:34 a.m. Park in the parking ramp. Delete junk mail while walking to the office. I take the stairs to my office on the 8th floor. I haven’t been good about exercising regularly ever since our 3rd child was born, but I still faithfully do stairs, a habit I started in graduate school.

Oberg_2

7:40 a.m. In my office … get a few quick emails taken care of before heading to my first meeting.

At the computer: Rick Around the table, from left: Hannah, Sapna, Bob, Inna, Caroline, Jon, Diane, Greg, Iana, Shane

At the computer: Rick. Around the table, from left: Hannah, Sapna, Bob, Inna, Caroline, Jon, Diane, Greg, Iana, Shane

7:52 a.m. Head across the street for my 8–10 a.m. meeting with the Vaccine Research Group (VRG). The agenda is pretty full today and includes a discussion of data from an outside vendor (the controls don’t seem to have worked, so we decide to ask them to repeat the assays), a review of paper writing progress and priorities (all first authors report status and anticipated submission dates on their respective papers), and R37 grant progress report/plan discussion (the five-year report is due July 1 … we are still tweaking the aims and have to cut from 14 pages of text down to 8 before it goes in). We ran over to 10:20 (unusual), and even so, there were several other agenda items we didn’t make it to.

10:26 a.m. Back in office. Answer a few emails. Gather materials for a couple of meetings this afternoon. Set a date to talk next week with Vera, a colleague with expertise in trial design, to discuss adaptive clinical trial designs and whether any of the principles would be applicable to the biomarker screening setting. (That’s one of the things I love about working at Mayo. We have one of the largest statistical groups in the nation and people are very collaborative and willing to discuss ideas.) Send agenda for my 1 p.m. conference call.

10:57 a.m. Bill, an MS statistician I work with, arrives for our 11 a.m. conference call. We sign on to talk with Mike, a collaborator in Arizona who works at TGEN and is affiliated with Mayo on our Arizona campus, about a grant for one of the National Cancer Institute’s Provocative Question RFAs. We need to finalize details for a couple of the experiments he is proposing. We discuss the hypotheses, sample size/power, and whether existing tissue microarrays (TMAs) will have the proper patient population for his study.

11:59 a.m. Bill and I discuss action items after the call. He will do power calculations and first draft of stat methods, then send to me. He’ll also pull the patient summaries for the TMAs. (Our statistical group works in teams here at Mayo. The pancreas group is large enough that we have portions of a PhD statistician [me], two MS statisticians [of whom Bill is one], and a statistical programmer analyst—typically a BS-level statistician.)

12:08 p.m. Oof! Two voice mails and a text from Nels that I hadn’t seen between meetings. Surprise! He’s in town with our 5-year-old son. Can I have lunch with them at Thursday’s on First? (They block off a couple of blocks on First Street every Thursday during the summer for street vendors, farmer’s market, and music. It’s the first one of the year! Yay!) I don’t have much ‘work time’ today to get things done, but can’t turn them down. Good thing I have a lot of meeting-free time tomorrow!

12:58 p.m. Back in the office and get set up for the next call. It’s with Brett from the University of Tulsa with whom we collaborate on one of the VRG grants. Diane, master’s statistician for the VRG, arrives for the call. We discuss the status of some gene modules Brett has been developing for our mRNA SEQ data, a draft of a manuscript we are working on, and ways to get him involved in some of the other VRG projects.

1:56 p.m. Sign off the conference call and head across campus for the biomarker conference call. Even though it’s nice outside, I walk through the subway since it’s faster and I’m running a bit late. Someone is playing the piano that sits in the Gonda building atrium. Patients and community members frequently come to play for people walking by, and it always makes me smile.

From left: Ann, Gloria, Bill, Rob

From left: Ann, Gloria, Bill, Rob

2:07 p.m. Arrive for the pancreas cancer biomarker collaboration conference call. They’ve already started the call. (I frequently have back-to-back meetings and seem to be chronically late!) In addition to the collaborators, we have a PhD epidemiologist (Gloria, who leads our pancreas group), an MD (Rob), and two statisticians (Bill and me) on the call. The discussion is focused on how to design the next set of studies for some promising cancer markers. This includes re-discussion of the target population, types of controls in light of that, balancing sample size, and assay cost. The collaborators are new to pancreatic research, and the insight of Rob, who treats this patient group, is invaluable in thinking about patient characteristics.

3:01 p.m. The conference call is ended. Gloria, Bill, and I stay for another 25 minutes or so and finalize sample selection details.

3:27 p.m. Since I have a 4 p.m. meeting on this end of campus, I find a place to sit and read a portion of a manuscript, rather than going back to my office.

4:00 p.m. I arrive for the last meeting of the day (finally on time) with the PI of the Mayo Clinic Ovarian SPORE grant and the core directors. We will be resubmitting our renewal application this fall. We discuss the timelines and potential obstacles, make sure core leaders are hearing from project leaders, etc. Fortunately, this is a great group of people to work with and the investigators are very collaborative (I find this is in general true of people at Mayo). Communication is already going and no major obstacles are foreseen.

4:47 p.m. We finish a bit ahead of schedule, so I head back to my office. I walk outside this time!

4:56 p.m. Back in the office. Go through emails from the day.

5:20 p.m. Head for home. More memorization work in the car.

5:51 p.m. Arrive home. The family is already eating because Sven, our first grader, has soccer practice at 6:30. So I hurry to change and get to the table. Cheeseburgers, salad, and grilled asparagus from the neighbor’s garden—yum!

6:10 p.m. Sven is scrambling to find his missing soccer socks … can’t find them so he’ll wear big sis’s socks (good thing they aren’t pink). I stay home for the first time all week (end of school always seems so busy with extra evening activities) and clean up after supper. Then EmmaAnn, Sorren, and I get the flowers I bought a week ago potted for the deck. And we finish just before the rain! It’s a fun evening with them.

Oberg_5

8:13 p.m. Nels and Sven return from soccer about the time it starts raining. The rain is light, so we go look at the garden. Much of it has sprouted, but only ~1/3 of the sweet corn is up … looks like we’ll be re-planting this weekend! There is a beautiful sunset in progress, so we walk up by the corncrib for a better view.

8:55 p.m. Nels lets me do a bit more outside while he puts the kids to bed. We try to get them down by 8:30, but it’s hard to come in that early this time of year since it stays light so late. Good thing there is only one more day of school left this week! Nels and I chat; it’s always good to regroup after a busy day. The soccer socks were discovered at bedtime … in Sorren’s drawer! (One con of having a 5 and 8 year old put away laundry!)

9:13 p.m. Nels searches for cabins for our summer vacation. I have some manuscripts I should read for work, but I pick up a book that’s been captivating my attention instead.

10:11 p.m. I finish the book. I’m liberated! ☺ Kennel the dog. Get ready for bed. Crash around 10:30.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

One Comment »

  • grant medical said:

    Woah! I’m really loving the template/theme oof this site.

    It’s simple, yet effective. A lot of times it’s hard to get that “perfect balance” begween usability and appearance.

    I must ssay that you’vedone a excellent job with this.

    Additionally, the blog loads extremely fast for me on Firefox.
    Excellent Blog!