Home » STATtr@k

Discussing Data Science

1 August 2015 5 Comments
Carl Letamendi

There is no doubt in my mind that the title “data scientist” comes with sentiments of prestige and utmost responsibility. When one thinks of data science in general, the fields of computer science and statistics seem to come to mind. In my present role, I feel as though I have a unique experience: I can satisfy the expectations of my position; I have a foot in both fields, but belong to neither. Here’s a short story of how I got to where I am and my opinion about where we stand in the ‘data science tug-of-war’ between computer science and statistical science.

Carl LetamendiCarl Letamendi (Dr. L) is the data scientist at an NYC-based primary education charter school management organization. He earned his PhD in conflict analysis/social science and holds an MBA in finance.

My undergraduate degree is in business, and my master’s degree is in finance, which essentially has a lot to do with statistical methods, forecasting, etc. While I was working on my master’s, I knew I didn’t want to be in banking. It wasn’t of interest to me. As a graduate student during the recession, I did become interested in financial crises and all other social consequences and structural violence that resulted from financial crises, however. I immediately went on to earn my PhD, which was, ironically, in a branch of social sciences called “conflict analysis”—a perfect combination to satisfy my research agenda.

As a PhD student, I had a strong preference for quantitative methods research, and I developed a research interest in quantifying aggregate social behaviors (via indexing) to predict social realities. I created a theory I call “The Cycle of Aggregate Sentiment.”

I was lucky enough to secure two short fellowships at the National Institutes of Health’s National Institute on Drug Abuse (NIH/NIDA) and at the U.S. Department of Agriculture’s Office of Civil Rights, Diversity, and Inclusion (USDA/APHIS/OCRDI), which gave me the opportunity to apply my analytical abilities to challenging projects in public health. However, after completing my fellowships and having my PhD conferred upon me in 2014, I, like most 20-somthings with a PhD, experienced something overly educated millennials experience—I had an extremely difficult time finding a job! After 300+ job applications and dozens of interviews, I felt like I was being discriminated against for having a terminal degree, for actually qualifying for the position, and for being too young.

One of my former classmates told me her employer needed someone who was quant-savvy and understood finance … in other words, me! After connecting with the CEO of the organization, interviewing, and performing a few SPSS work samples using raw data that was sent to me, I was hired! My wife and I stuffed our belongings into a U-Haul and moved from Florida to the NYC area.

My initial role in the organization was that of an “analyst” (financial and data), but we soon realized I didn’t just “analyze”; the data culture was nonexistent and I had no formal training. Eventually, my superiors decided “data scientist” suited me best, since I am the person in the organization who is turned to for data and I encompass the three main skills needed for data science: computer programming, content knowledge, and statistical/quantitative abilities.

I essentially take raw data and students’ assessment scores, decide how these data could be used and what kinds of correlations and assertions I can draw, and develop creative ways to solve any issues the data show me. I do not have a formal “research agenda,” nor is there a set of specific reports the organization expects. It’s basically my job to figure that out and promote a data culture across all elementary and middle schools in our network. It’s been an interesting and lonely road. To date, I believe I am the only data scientist in primary education!

As an outsider with a coveted title, I have noticed “data science” seems to be the rope in a tug-of-war between the fields of statistics and computer science. In my opinion, it is somewhat unfair, but most positions that advertise for my job title require strong knowledge of computer programming skills (SQL, Python, Hadoop, R, C++, etc.). In fact, with just the title alone, I am flooded with emails via LinkedIn from data science/IT recruiters!

I have basic knowledge of Python and R, but even if I were 100% proficient, I wouldn’t use it at work. I use a lot of stats, SPSS, and Excel, but very little programming. However, many data scientist positions I have seen require more programming and coding, less statistics. Personally, I don’t think a computer science graduate can do what a statistics graduate can do, and vice versa. I believe data science requires two fields coming together, just as epidemiology brings together pathologists and statisticians, for instance.

I think data science is a buzzword being used in the tech world, among companies that have what qualifies as Big Data (Facebook, LinkedIn, Twitter, etc.). However, I think they are looking for candidates among themselves, and not among statisticians. It is almost as though our modern statisticians are expected to know how to code. I don’t know if this is something that we, as “quants,” will have to accept and adapt to, or if the field will push coding back to computer science folks and leave the statistics to statisticians. I guess this is one of those instances when it is appropriate to say only time will tell.

I read the ASA and Significance magazines often and one thing I notice is that those who give advice to current statistics students sometimes say they would double major—statistics and computer science—if they could do it all again. But see, I don’t think it’s necessary. I simply think that we, in the field of statistics and in the social sciences, should understand programming to a level of proficiency expected of us to meet the demand. There are free courses out there that can teach the basics. I like to perceive data science, computer science, and statistics as adjacent cogs in a wheel with a similar objective, but they cannot easily replace each other!

If statistics and computer science are engaged in a game of tug-of-war and data science is the rope, I think we should actually use the rope to bind us so we can collaborate by contributing our unique areas of expertise. My hope is that employers will realize a need for both statisticians and computer scientists and advertise positions for both, so as to maximize their analytical potential as an organization.

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

5 Comments »

  • Anagha Kumar said:

    Does the author know what PhD programs in Statistics and Biostatistics are like? The top programs in Statistics and Biostatistics produce outstanding coders. Dissertations in these programs involve extremely computationally intensive methods that have to be coded from scratch by said students. Journals such as The Journal of Statistical Software, Computational Statistics, Computational Statistics and Data Analysis, Statistics and Computing, Journal of Statistical Computation and Simulation etc. might help him understand how much emphasis is placed on coding and how proficient the graduates of these programs are. Contrary to popular belief, we do not spend our time performing t-tests in Excel.

  • Arturo Rosas said:

    It is interesting the article of the author. A data scientist is more than statistics and computing. Based on the profile of a data scientist of Dr. Rachel Schutt, one data scientist needs skills in the following domains:
    – Computer Science
    – Math
    – Statistics
    – Machine learning
    – Domain expertise
    – Communication and presentation skills
    – Data visualization
    Dr. Schutt is the author of the book Doing Data Science.

  • Anagha Kumar said:

    Well that’s just it – let’s go through the above list.
    Communication and presentation skills: Cannot be taught. Some people are good communicators, others are not. Assuming someone who is skilled at math/stat is not a good communicator is absurd.

    Data Visualization is a PART of what we do as PhD level statisticians. Data visualization is not a field unto itself. Honestly!

    Domain expertise is gained since statisticians work in different subject areas. I am a biostatistician. I would be so bold as to say I have some domain expertise.

    Machine Learning is a sub-field of statistics. I would argue it is one of the hottest sub-fields right now.

    Computing, Math, and naturally Statistics are heavily emphasized in PhD programs in statistics. In fact, the vast majority of PhD students in stats PhD programs study Math/CS in undergrad. Also, statistics is an applied math field!

    I simply cannot understand how the INVENTORS of data analysis (statisticians) are not given a place at the table and are told that we lack – well what exactly? What do statisticians lack that is mentioned in the above list?

  • Vincent Granville said:

    Business acumen. The ability to measure the yield that you provide above base line, and convince decision makers about your added value, assuming you bring any, after factoring out the cost of hiring you.

  • Anagha Kumar said:

    In response to Vincent’s comment, again, you cannot assume someone who has an advanced degree in statistics lacks business acumen. People choose to get degrees in various fields as a means to an end. I’m not convinced that statisticians lack the ability to measure the yield that they provide above base line, and convince decision makers about their added value, assuming they bring any, after factoring out the cost of hiring them. Some people have business acumen, others do not. Nobody can generalize and say that people getting degrees in a particular field lack x or y or z as a whole – there is tremendous individual variation.