In recent decades, what I call “metric fixation” has engulfed an ever-widening range of institutions: businesses, government, health care, K-12 education, colleges and universities, and nonprofit organizations. It comes with its own vocabulary and master terms. It affects the way that people talk and think about the world and how they act in it. And it is often profoundly wrongheaded and counterproductive.
Metric fixation consists of a set of interconnected beliefs. The first is that it is possible and desirable to replace judgment with numerical indicators of comparative performance based on standardized data. The second is that making such metrics public (transparency) assures that institutions are actually carrying out their purposes (accountability). Finally, there is the belief that people are best motivated by attaching rewards and penalties to their measured performance, rewards that are either monetary (pay for performance) or reputational (rankings).
But not everything that is important is measurable, and much that is measurable is unimportant. Most organizations have multiple purposes, and that which is measured and rewarded tends to become the focus of attention, at the expense of other essential goals. Similarly, many jobs have multiple facets, and measuring only a few of them creates incentives to neglect the rest. Almost inevitably, people become adept at manipulating performance indicators. They fudge the data. They deal only with cases that will improve performance indicators. In extreme cases, they fabricate the evidence.
It’s not that measurement is useless or intrinsically pernicious. The challenge is to specify when performance metrics are genuinely useful—that is, how to have metrics without the malady of metric fixation.
Should you find yourself in a position to set policy, here are some questions that you should ask, and the factors that you should keep in mind, in considering whether to use measured performance, and if so, how to use it.
What kind of information do you wish to measure? The more the object to be measured resembles inanimate matter, the more likely it is to be measurable: that is why measurement is indispensable in the natural sciences and in engineering. When the objects to be measured are influenced by the process of measurement, measurement becomes less reliable. Measurement becomes much less reliable the more its object is human activity, since the objects—people—are self-conscious and are capable of reacting to the process of being measured. The more rewards and punishments are involved, the more people are likely to react in a way that skews the measurement’s validity.
How useful is the information? The fact that some activity is measurable does not make it worth measuring. Indeed, the ease of measuring may be inversely proportionate to the significance of what is measured. To put it another way, ask yourself, is what you are measuring a proxy for what you really want to know? If the information is not very useful or not a good proxy for what you’re really aiming at, you’re probably better off not measuring it.
Are alternative measurements available? Are there other sources of information about performance, based on the judgment and experience of clients, patients or parents of students? In a school setting, for example, the degree to which parents request a particular teacher for their children is probably a useful indicator that the teacher is doing something right, whether or not the results show up on standardized tests. In the case of charities, it may be most useful to allow the beneficiaries to judge the results.
Tools of measurement are most useful for internal analysis by practitioners rather than for external evaluation by the public, which may fail to understand their limits. Such measurement can be used to inform practitioners of their performance relative to their peers, offering recognition to those who have excelled and offering assistance to those who have fallen behind. To the extent that they are used to determine continuing employment and pay, they will be subject to gaming the statistics or outright fraud.
MORE FROM REVIEW
What are the costs of getting the data?Information is never free, and often it is expensive in ways that rarely occur to those who demand more of it. Collecting, processing and analyzing data take time, and a large part of their expense lies in the opportunity costs of the time put into them. Every moment that you or your colleagues or employees devote to producing metrics is time not devoted to the activities being measured. If you’re a data analyst, of course, producing metrics is your primary activity. For everyone else, it’s a distraction. Even if the performance measurements are worth having, their worth may be less than the costs of obtaining them.
Who develops the measurement? Accountability metrics are less likely to be effective when they are imposed from above, using standardized formulas developed by those far from active engagement with the activity being measured. Measurements are more likely to be meaningful when they are developed from the bottom up, with input from teachers, nurses and the cop on the beat.
This means asking those with the tacit knowledge that comes from direct experience to provide suggestions about how to develop appropriate performance standards. Try to involve a representative group of those who will have a stake in the outcomes. In the best case, they should continue to be part of the process of evaluating the measured data. A system of measured performance will work to the extent that the people being measured believe in its worth.
Does the measurement create perverse incentives? Insofar as individuals are agents out to maximize their own interests, there are inevitable drawbacks to all schemes of measured reward. If doctors are remunerated based on the procedures they perform, it creates an incentive for them to perform too many procedures that have high costs but may produce low benefits. If doctors are paid based on the number of patients they see, they have an incentive to see as many patients as possible and to skimp on procedures that are time-consuming but potentially useful. If they are compensated based on successful patient outcomes, they are more likely to take the easiest cases, avoiding problematic patients.
Just because performance measures often have some negative outcomes doesn’t mean that they should be abandoned. They may still be worth using, despite their anticipatable problems. It’s a matter of trade-offs, and that too is a matter of judgment.
With measurement as with everything else, recognizing limits is often the beginning of wisdom. Not all problems are soluble, and even fewer are soluble by metrics. It’s not true, as too many people now believe, that everything can be improved by measurement, or that everything that can be measured can be improved.
—Dr. Muller is a professor of history at the Catholic University of America in Washington, D.C. This essay is adapted from his new book, “The Tyranny of Metrics,” published by Princeton University Press.