In 2014, Fox News’ CEO, Roger Ailes was slapped with a slew of workplace sexual harassment claims from a number of his female anchors. The biggest accusations (apart from the obvious) included creating a culture that made it hard for women to succeed and climb the corporate ladder. His departure (or termination) in 2016 was followed with a promise from the network to do better, and they embarked on one of their first-ever documented People Analytics programmes.*
I build People Analytics models everyday. Broadly, the models I build fall into two categories and for the purpose of this blog let me call them:
1. Employee Performance Models. These models involve using employee performance metrics – as independent variables or response variables. Be it to predict overall employee performance or to use in predicting something else. Examples include: what do my high performers do differently, what educational backgrounds do we need to look for in prospective hires, who do we include in next years HiPo program, etc.
2. Employee Behavioral Models. These models involve using data to predict, explain or understand non-performance related phenomena. Examples include: quantifying engagement, predicting flight risk, understanding leading contributors to employee burnout and identifying the best employees to lead change.
So how are they different? Both can be predictive. Both can solve business questions. Both can involve machine learning algorithms. However, performance models require a variable that is a proxy to employee performance and this makes building them incredibly complex and risky. The validity of the model depends entirely on the practitioner’s definition of employee performance and the reliability of the associated data points.
Let me use Fox News as an example of how building Performance Models can be really tricky. To counter their hitherto terrible hiring practices, they introduced a machine-learning algorithm that sought out the best potentials to promote. The model would go through years of historic data to identify the common characteristics of Fox News employees who succeeded which would then be used in future promotion decisions. That sounds like great news, right? The model would remove human bias against gender and race and choose candidates purely on merit.
To build the model, their scientists defined success for an employee as “staying there for five years and being promoted twice”. It’s the first step in an Employee Performance model. They then trained their model to identify what these “successful” employees had in common which would in turn be applied to the current pool of applicants to find the best people to promote. It all went awfully wrong. The model built to help eliminate gender bias ended up systematically removing women from its selection process. Because many women in the past were not allowed to climb the corporate ladder – and therefore did not make it to the “successful employee” training data set. The model failed the moment scientists chose the wrong proxy for performance. This for me is a sublime example of how tricky building Performance Models can be.
Practitioners often choose from the following four categories (examples of each can be found below) of employee performance metrics when building performance models. Remember, Fox News used a tenure of 5 years as a way of defining performance.
1. Work Quality: Subjective appraisal by managers, 360-degree feedback, number of clicks on content, product defects, customer feedback, etc.
2. Work Quantity: Number of: units sold / produced, calls answered, handling time, blogs written, cold calls, active sales leads, revenue brought in, packages packed, etc.
3. Work Efficiency: A combination of quantity and quality. Number of leads per dollar spent, production units per month, etc.
4. Organization level employee performance metrics: Absenteeism rate, overtime days, % pay rise in the last two years, number of promotions in the last two years, training hours requested, etc.
I have to admit – choosing reliable proxies from data that companies usually collect on performance is hard. In my experience, I have had to deal with:
- Bias. Qualitative feedback is the most prone to bias – be it manager appraisals or team reviews. Apart from the obvious contrast, halo effect, lenience, recency and personal biases, most managers are not trained to deliver good performance reviews. Most of the assessment that managers complete focuses on “the person,” including characterizations of their personal “traits” (i.e. commitment), knowledge (i.e. technical knowledge) or behaviors (i.e. attendance). While these factors may contribute to performance, they are not measures of actual output.
- Incomplete pictures: Quantitative feedback is usually much easier to work with. But they sometimes don’t reveal the full picture. Let’s take for example the role of social capital (a variable that is rarely built into models) in employee performance. The team in figure 1 below were force ranked according to their quarterly billings. According to the quantitative quarterly billings metric, the employee in red performs quite poorly. However, she does serve as a hub of expertise and advice for the rest of her team. She helps her colleagues close deals, offers them advice on what to do better and brings positive energy to the team. I might even argue that her team members perform well because of her. Two views – two pictures.
- Recency: Most performance data is collected either once or twice a year. Employee performance is dynamic and nuanced over time. A single score at the end of the year is never going to capture anything useful.
- Inconsistent Rating: Inter-rater reliability is generally very low between managers at any organization. What one manager considers to be “acceptable” performance, another may consider “not meeting expectations.” This can be a challenge for any organization and is made more of a challenge in situations where the criteria used are subjective and not based on any measurable performance outcomes.
It’s not all doom and gloom. Building good performance models is not impossible. Depending on which side the performance metric exists on the model, here are some things you can do:
1. Do not define “performance” using one metric. When building models where performance is used to predict something else, always remember – it is impossible to capture true performance in one single employee metric. Combine a bunch of them. Go from “a high performing employee received positive manager feedback” to “a high performing employee received positive manager feedback, produced high quality output, requested training, acted as an informal innovator and received a 20% salary increase over the last year”. This will reduce the impact of any contamination of unreliable variables.
2. Include as many explanatory variables as you can. When building models to predict or understand overall employee performance, always ask yourself – what could be other confounding variables that you have not taken account of? Is an employee’s task performance (measured in terms of the number of tasks employees perform correctly per hour) in a packing factory only dependent on employee tiredness and motivation? Or could there be other factors such as pleasant background music that could play a part as well? Ensemble learning, for example, can immensely help choose the important variables out of a bunch.
3. Reverse causality and simultaneity. A lot of people related phenomenon are circular so its important to test the direction of your causal models. Does X cause Y or Y cause X? Or does X cause Y which in turn causes X? I once read this study on employee engagement that made this claim: highly engaged employees are less likely to take sick leave. However, further research proved that employee health and engagement had a two-way and self-reinforcing relationship. The health of an employee had a significant impact on their levels of engagement.
4. Speak to managers. Rome wasn’t built in a day. Neither are good People Analytics programs. Speak to managers to understand how they define performance and build a framework accordingly. I always urge companies to collect data on social capital as it always reveals key data points on performance. Read more on social capital here.
5. Be aware of statistical bias. Do research to understand any possible points of bias contamination in your data. Note that statistical error is not the same as bias. Statistical bias is the systematic favoritism of certain individuals or certain responses in a study and it’s a data scientist’s worst nightmare. For example, don’t measure employee engagement during a crisis.
6. Don’t be afraid to say no. There will be times when clients (or your own boss) might ask you to just go ahead and build a model on data you are not comfortable with. Do not do it. No model is better than one you know is badly built. After all, these models impact real people decisions.
So how would you estimate employee performance? Have you built any models and how would you circumnavigate the issues I mentioned in this blog? What advice could you offer practitioners in the space? Comment!
*The methods used by the Fox people analytics program were published in the book by Cathy O’Neil, titled Weapons of Math Destruction.
Originally published on the TrustSphere blog