Home » Featured

Analytics Evolution

1 September 2014 1,712 views 2 Comments

ScintoPhilip R. Scinto is a senior technical fellow for the Lubrizol Corporation, where he has been employed since 1989. He is also a recently named Fellow of the American Statistical Association. Scinto holds a BS from Cornell University and an MS in statistics from Carnegie Mellon University. He is known for applying innovative statistical solutions in industry, practical applied research in supersaturated designs and statistical engineering, and leadership of the ASA Conference on Statistical Practice.

Analytics is the process of generating data-driven knowledge, insights, and improved decisions by the integrated application of technologies such as computers, operational research, and statistics. The field and need for experts is growing due to the growth of data breadth and depth, as well as an increased pursuit of tackling complex problems to create a competitive advantage. The power of analytics lies in the concept that the whole integrated combination of technologies is more than the sum of the individual technologies.

During my almost 30 years as an applied statistician (25 of those years with Lubrizol), I have realized that an ‘analytics mindset’ plays an important role in my effectiveness in the physical and chemical sciences industry. Yes, I have the skills and experiences expected of a statistician in my industry; however, there is more to being a statistical consultant than designing experiments, developing regression models, estimating repeatability and reproducibility, performing analysis of variance, or creating control charts. As applied statisticians, we should strive to improve the capability and cost-effectiveness in the generation of fundamental knowledge and decisionmaking in our organizations. OK, easier said than done.

What does this mean? Basically it means providing long-term, robust, smart tools for decisionmaking based on logic, data, and subject area knowledge. It means, to have a lasting impact, the smart tool solutions must be based on cause and effect variables—rather than association variables—wherever possible.

I call myself a statistician because I have a degree in statistics and biometry from Cornell and a master’s degree in statistics from Carnegie Mellon. Also, I have researched and used statistical tools throughout my career. Am I a data scientist, too? Do I practice analytics? Currently, there are many forum arguments on semantics, but I would say essentially yes. However, it has taken many years, realizations, and experiences to get to this point. My evolution into analytics was motivated by my need and desire to generate improved results for Lubrizol. In addition to my statistical toolkit and appreciation of data and algorithms, I needed to enhance my understanding of Lubrizol technology, business, and issues; communication and collaboration skills; and understanding and use of experience and expert opinion from areas such as chemistry, chemical engineering, mechanical engineering, and sales and marketing.

Preparing for a Career in Analytics in Industry

So, how does one prepare for a career as an applied industry statistician in analytics? I believe the first thing that needs to be done is a self-assessment. Can you answer yes to most of the following statements?

  • I enjoy working with complex data to enhance understanding of a scientific or business problem
  • I enjoy working on problems with no known solutions
  • I would rather adequately solve many problems in a short period than spend a lifetime creating the perfect solution for one problem
  • I would rather be good at many things, rather than great at one thing
  • I have an interest in science and/or engineering
  • I want to know how things work
  • I care about the practical problems of other people
  • I value the experience and expertise of others outside my field
  • I want to create something useful for customers and consumers
  • I need to make an impact

If most of these statements resonate with you, then a career as an applied industry statistician using analytics is probably a good fit. However, you may change over time, and what you enjoy at 20 years old may not be the same as what you enjoy at 40. This self-assessment should be made on a periodic basis as you gain experience in school, work, and life.

Now that you know what you want to do, you need to develop and update the tools, skills, and experiences needed to make yourself effective. Note that it is much more difficult and takes more time to gain the necessary experiences than it is to learn about data analysis, data manipulation, and information systems.

Tools and Skills
The technical skills I think are necessary for an effective career in analytics include expertise in the following areas:

  • Methods for effectively analyzing wide data such as Bayesian model averaging, random forests, and ensemble approaches
  • Construction of efficient experimental designs in high dimensions (e.g., supersaturated designs)
  • Operations research areas such as simulation, optimization, and process monitoring
  • Computer science to enable data manipulation, application and web tool development, and the use of modern computer architectures (e.g., multi-core systems)

Concepts
The concepts include the following:

  • Causation versus correlation and the concept of “proxy” versus “lurking” variables
  • Common versus special cause variability
  • Autocorrelation
  • Over-fitting

Experiences
As mentioned earlier, the experience piece is the most difficult to obtain. The best way to gain experience is by practicing (working on as many projects as possible), making mistakes, and learning from those mistakes. Some of what I have learned along the way includes the following:

Understand “Proxy” versus “lurking” variable. When I graduated from school, I thought I was particularly good at modeling. My biggest strength was that I leaned heavily on subject matter, which made my models particularly robust. Throughout the years, I have learned how to make my models far more effective by looking for the “lurking” variable. Yes, the scientist could confirm that, based on theory, ingredient XYZ is better than ABC or Laboratory I yields better results than Laboratory II. And yes, I could build a great model with laboratory, time, and ingredient terms. However, I learned that such a model has a limited life because things change and I did not have terms that captured the essence of the phenomena.

Ingredient, time, laboratory, region, engine, test tube, etc. are all “proxy” variables. They represent what is really happening, but are not the root causes such as molecular structure, physical properties, temperature, speed, load, pressure, etc. I found that the more success I had in turning proxy variables into lurking variables, the more robust my model and the better my predicted results. I also have learned that sometimes it is good enough to use proxy variables for the sake of time, or in cases in which the variable is not essential to the solution.

Optimization is valuable. It is rare that finding the best solution in one dimension is useful. Typically, products need to perform across a variety of areas at the lowest cost. It is therefore important to learn to develop solutions for the entire system, rather than one particular area.

Solutions need to make sense and be easy to use and accessible. If the customer cannot access and use your solution easily, then it is not much of a solution. Your solution needs to make sense to you and your customer. Think about the problem and the solution. It all needs to make logical sense. In addition, the better your customer can understand your solution, the more likely it will be used. Solutions also should include documentation and an audit trail so improvements can be made without much rework.

Keep learning. Keep up to date on the latest philosophies, analysis techniques, and information system/software tools. I have improved my effectiveness throughout the years by learning about supersaturated design, Bayesian model averaging, regression trees, and statistical engineering. I could be doing a better job learning new methods on the data side.

Stop being such a statistician. I do not know how else to say this. Basically, stay away from being the stereotypical statistician. Everything does not have to be perfect in a design or analysis to be useful. We can afford correlation in the variables. We can afford to build user experience into our models without hard data. We must not rely on traditional methods that do not work well with large, messy, correlated data such as p-values and stepwise regression. We can afford to use proxy variables so we can reduce variability in the data and discover critical root causes. We can analyze data with residuals that are not independent. We can implement a solution that is not perfect for the sake of time. We can realize we should not fall in love with our models/solutions.

What About Big Data?

I often work with extremely wide data (hundreds and even thousands of potential predictor variables). However, I have not yet found the need to work with truly Big Data in my industry. One promising area we are exploring at Lubrizol is text mining of emails and scientific reports. I think this information would be of use in setting priors for analysis of empirical data. For the sciences, in general, I think that if we can get at the root causes, Big Data should be thought of more as empirical data; however, when all we have is correlation, Big Data should be thought of as prior data with a more designed approach to test out the prior belief. Of course, in other areas where lots of data are streaming and answer timing is critical, I can see relying on unproven trends as long as those models are updated on a continuous basis.

I think Big Data has its place and is potentially very useful, but I do not believe in creating and working on Big Data for Big Data’s sake. Do not fall into the trap of creating difficult problems, models, and solutions when and where easier ones are available or suffice. Also keep in mind that Big Data and Big Data analysis are not substitutes for sound logic in defining a problem, using prior information, interpreting results, verifying solutions, and developing fundamental knowledge.

Summary

If you like to solve problems, if you enjoy working with data, if you like science, and if you enjoy creating sustainable solutions for difficult problems, then a career in statistics and analytics in industry may be a good fit for you. Keep in mind that a career in this area is a marathon, not a sprint. First, you must learn the necessary technical tools and understand concepts about data. The difficult part is gaining the experiences necessary to be effective. These experiences will come with practice, a willingness to listen to the customer, a willingness to learn additional tools and techniques, and a desire to generate and create improved results for your organization. I think this potential is within all of us. Good luck!

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

2 Comments »

  • aldotv96.roon.io said:

    what is the mean of something

    Once the processes, resources and deadlines are defined, it’s imperative to develop approaches to
    manage the progress. Is it waking up at six inside the morning, building a coffee and just typing away, with nobody to bother
    you with questions or interruptions. down (aldotv96.roon.io) what
    do x’s and o’s mean Unfortunately, most Linked – In profiles (especially
    the summary section. By offering an excellent guarantee it’ll inspire confidence and trust.

  • Randy Bartlett said:

    I am sold. Where do I sign up!

    By the way, pass me the Statistical QDR (Qualification, Diagnostics, & Review).