The Promise and Peril of AI-Assisted Feedback
Performance reviews written by ChatGPT look like the real thing. That’s a problem.
Update: Shortly after we ran this piece, The Wall Street Journal reached out for comment for a story about tensions brewing around employees’ use of ChatGPT.
You’re reading the newsletter of Hear Me Out, a culture strategy firm for growing teams. We talk to employees confidentially, then help their leaders make work more rewarding. Learn how we’re fostering healthier dialogues.
In December 2022, sci-fi magazine Clarkesworld cut off new submissions after being flooded with AI-generated stories. The founding editor, Neil Clarke, blamed the surge on passive-income YouTube influencers. The ease of generating good-enough stories, along with sci-fi’s 8¢/word minimum rate, had created a free-market monster. “We tend to pay better than most,” he told Quartz, “so we’ve made it onto certain lists that are being used by get-rich-quick people.”
Sci-fi scammers aren’t the only ones who stand to get very, very rich from AI. In the past year, AI companies like ChatGPT and Stable Diffusion captured headlines while driving over $2.6B in investment. And the $47B productivity software market is one of the most obvious, and lucrative, targets. But while AI writing tools have real potential when trained on the right data, they can easily be abused by managers under pressure to take on more direct reports with a fraction of the resources.
AI-assisted performance feedback is an HR minefield
HR is about to get a lot more familiar with AI. Today, Microsoft will announce AI integrations into Office at a “future of work with AI” event. Earlier this month, Slack, Notion, and Grammarly all announced AI features to speed up writing. Potential applications include tedious but critical management work like performance reviews, a use case that’s already come up on Reddit forums dedicated to ChatGPT.
On the surface, this may seem like a gift to managers. After all, no one likes writing performance reviews, most of them read as formulaic, and managers spend 210 hours a year on them, according to advisory service CEB. Under the right conditions, generative AI could free managers to spend more time thinking about their direct reports’ performance and less time crafting precise language to describe it.
But while AI can easily help plan a trip or write a form email, its use by managers is an HR minefield. That’s because in its current form, the technology enables managers to quickly create performance reviews that sound thoughtful without the hard work of thinking. To quote economics professor Dierdre McCloskey, author of Economical Writing, they make it easy to “hide a five-cent thought in a five-dollar word.”
While AI is often framed as a way to a quick first draft, plenty of overworked managers will barely have time for a quick pass.
To understand why, consider how these AI bots work. If an English speaker hears the words “Happy birthday to…”, it’s easy to guess the next word is probably “you.” ChatGPT and other large language models use statistics and training data to generate fluent bullshit: confident, superficial prose with no relation to the truth. This kind of language is everywhere at work, from quarterly reports to job listings to daily status check-ins and, of course, performance reviews.
But while AI is often framed as a way to a quick first draft, a lot of overworked managers will only have time for a few quick edits. While those managers may have rushed through performance reviews in the past, with ChatGPT, anyone can write fluent BS on full autopilot. And for leaders looking at top-line metrics, it could easily seem like those managers are doing great, getting all their reviews in on time.
Generative AI fills the gaps with its own assumptions
If most performance reviews aren’t great to begin with, what’s wrong with delegating them to AI? Won’t the output be roughly the same quality as what a rushed manager would be able to produce with less speed? Perhaps. But the way these systems fill in the gaps could also lead to serious inconsistencies—or even outright lies.
To illustrate how AI turns hastily-written notes into a detailed review, I asked Notion AI to “Write a performance review for John Doe” with the following outline:
John has hit all of his metrics this quarter
He’s built great relationships with colleagues, especially Jane and Jack
He needs to work on his focus and prioritization
He’s sometimes late to meetings, and takes too many WFH days
Here’s the output:
John has successfully achieved all of his metrics and KPIs for this quarter. He has demonstrated exceptional performance and has contributed significantly to the team's success.
John has shown excellent interpersonal skills and has established strong relationships with his colleagues, particularly Jane and Jack. He has also been a great mentor to new hires, sharing his expertise and experience. However, John needs to work on improving his focus and prioritization skills. While he has delivered all his assigned tasks, some of them have been delayed, affecting the team's timelines.
Additionally, John has been occasionally late to meetings, which disrupts the team's productivity. He has also taken too many work from home days, which can sometimes hinder communication and collaboration among team members.
To address these areas for improvement, John can benefit from coaching and guidance from his manager. Moreover, he can benefit from training and development opportunities to enhance his time management, prioritization, and communication skills.
Then, I asked the AI to summarize the review, and got this:
This document is a performance review for John Doe. While he has successfully achieved all of his metrics and KPIs and demonstrated exceptional performance, he needs to work on improving his focus and prioritization skills. Additionally, he has been occasionally late to meetings and taken too many work from home days. To address these areas for improvement, John can benefit from coaching, guidance, and training opportunities.
After cutting it back down to a summary, the app returned my original bullets, with some generic recommendations for employee development.
So what’s happening behind the scenes here? ChatGPT is essentially doing the text equivalent of upscaling a pixelated, low-resolution video to 4K. From afar, it looks like a coherent review, with positive and negative points of feedback and suggestions for improvement. But on closer inspection, it’s mostly the same four bullet points dressed up in five-dollar words: the bot isn’t adding any substance to the details.
Where the AI does fill in the details, they’re based on assumptions that may not actually be true. It assumes John has been a great mentor. More alarmingly, it also adds criticisms that aren’t mentioned in the original prompt: that John’s missed deadlines, a failure which is “affecting the team's timelines,” and that his work from home days “hinder communication and collaboration among team members.”
[Note: After OpenAI announced GPT-4, I tested the same prompt in Poe, a chat bot already using the new version. The results were similar, including the mistakes.]
Just because you can automate, doesn’t mean you should
Large corporations including Walmart and Amazon have warned staff to avoid sharing data with ChatGPT. One large NYC employer has also specifically barred managers from using AI for performance reviews, according to an internal memo shared with Hear Me Out. But teams looking at AI for core HR functions should consider more than just the security risks.
Like all technologies, AI systems are subject to bias and limited by human factors. As the example above shows, these systems fill in a lot of gaps with assumptions, and some of them are tough to catch. Extensive testing by anti-bias software Textio shows ChatGPT consistently adds gender and racial bias into performance reviews. And the benefits of AI-generated feedback are offset by what one study named the “disclosure effect,” where knowing feedback was written with AI makes the receiver trust it less.
Companies could train these systems on their own performance reviews to improve the outputs. But even those companies risk embedding their own historical biases into the system. And is a performance review from 2013 really a useful reference after the seismic shifts in the workplace over the past few years?
It’s possible that in the future, performance reviews will be written with the help of purpose-built AI tools instead of general-purpose tools like ChatGPT. One startup, Confirm, launched its own integration in February, using AI to summarize peer feedback shared through its platform. The feature is turned off by default, and requires a special onboarding, including manager training, to enable.
David Murray, Confirm’s co-founder and president, stressed the need for caution, saying “If a manager is using [ChatGPT] as an escape hatch to avoid providing valuable feedback, it can be destructive.” The real value of the tech, he explained, comes from its ability to summarize peer feedback. (How these tools will work when that feedback itself is generated by AI remains to be seen.)
If a performance review can be automated away with a pattern-seeking machine, was it really being given the thought that was due?
Rather than asking what parts of managers' jobs to automate, HR might ask why so many feel ripe for automation. If a performance review can be automated away with a pattern-seeking machine, was it really being given the thought that was due? And if leaders believe performance reviews can be mostly written by a pattern-seeking machine, what does that say about the culture? Does the organization see management as a strategic discipline, one that requires insight and creativity, or as a rote, mechanical function that should be optimized to the hilt?
Managers should remember that while generative AI can be a useful tool, and may even be able to fool the editors at a third-tier sci-fi magazine, it's still incapable of generating nuanced or original ideas. As Noam Chomsky and his co-authors argued recently in a New York Times Op-Ed, “True intelligence is demonstrated in the ability to think and express improbable but insightful things.” While that may not describe every manager (or even your own), it’s still worth aspiring to.