I’m still working on that follow-up post on regression to the mean, but in the meantime I wanted to put up a post comparing various player rating systems. For the most part this will be a subjective rather than objective evaluation of the metrics, along the lines of Dean Oliver’s “laugh test” (as in, “a rating system that thinks Dennis Rodman was better than Michael Jordan doesn’t pass the laugh test”). I think looking at how players are rated differently in various systems can tell us a lot about both those players and those rating systems.
The Player Ratings
I took a look at seven popular player ratings. Two basic linear weights metrics based on boxscore stats - John Hollinger’s Player Efficiency Rating (PER), and Dave Berri’s Wins Produced (WP). Two metrics built on Dean Oliver’s individual offensive and defensive ratings - Justin Kubatko’s Win Shares (WS), and Davis21wylie’s Wins Above Replacement Player (WARP). And three plus/minus metrics based on team point differential while the player is on the court - Roland Beech’s Net Plus/Minus (Net +/-), Dan Rosenbaum’s Adjusted Plus/Minus (Adj +/-), and Dan Rosenbaum’s Statistical Plus/Minus (Stat +/-). For the purposes of comparison I looked at the per-minute (or per-possession) versions of all these metrics (e.g. WP48 instead of WP, WSAA/48 instead of WSAA, WARPr instead of WARP).
Using data from Basketball-Reference and Doug’s Stats I calculated PER and Wins Produced on my own, so the values may differ slightly from those you’ve seen elsewhere (I should note here that Wins Produced has a position adjustment that sets the average guard’s rating equal to the average big man’s rating, a feature (or bug?) which is not present in any of the other systems). For Win Shares and WARP I got this year’s ratings from Basketball-Reference and this APBRmetrics thread, respectively. For Win Shares I converted Win Shares Above Average (WSAA) to a per-48-minute rating (I was able to duplicate the calculations for Win Shares on my own but I wasn’t sure how to calculate Loss Shares). For Net +/- and Adjusted +/- I used data from BasketballValue, and I calculated Statistical +/- on my own. For all metrics other than Adjusted +/- players who played for multiple teams in the season did not have their stats combined but instead had each stint looked at separately.
First, the top and bottom 10 in each boxscore-based rating system among players who played at least 500 minutes in 07-08:
Next, the top and bottom 10 in each plus/minus-based rating system among players who played at least 500 minutes in 07-08:
Averaging how each player was ranked by all seven metrics, here is a consensus top and bottom ten, along with each player’s rank in each metric:
One thing that jumps out is that despite being rated the first or second best player in the league by five of the seven rating systems, Chris Paul is not among the consensus top ten. He dropped to 14th overall due to his very mediocre Adjusted Plus/Minus ranking (154th out of 329). Exactly why he rated so low in this metric has been the topic of some recent debate. Amare Stoudemire was another player who ranked much lower in Adjusted Plus/Minus (194th) than in the other metrics.
Another eye-popper is seeing Amir Johnson, the 21-year-old Detroit power forward who’s been riding the pine in the playoffs, ranked first in the league in Adjusted Plus/Minus. This actually isn’t as great an anomaly as might be expected - Johnson rated rather well across the board. His consensus ranking was 15th. He was rated lowest by PER (64th), but he ranked 11th in Win Shares and 20th in Statistical Plus/Minus. Obviously one has to use some caution considering he played under 800 minutes on the season, but the fact that he rated well in several metrics could be a good sign for the future.
Where the Rating Systems Differ
To further examine the rating systems, for each one I wanted to see which players it liked better (or worse) than the other systems. To do so I found the difference between each player’s ranking in that system and his average ranking in the six other systems. In the chart below if a player is ranked very high in the given metric but much lower in the other six, then he will appear near the top of the list, as a player that that metric “likes” more than other metrics do (e.g. PER likes Kevin Durant a lot more than other systems, but doesn’t like Shane Battier as much as other systems). This could be seen as a list of players that the metric overrates (or underrates, if you’re looking at the bottom players) relative to other rating systems. Or, if you think the players at the top of a list tend to be underrated (statistically), then maybe that metric is the one for you.
Win Shares is highly tied in to team wins, and that can be seen clearly from how highly it rates role players from great teams and how poorly it rates stars from awful teams. One can make corresponding diagnoses for the other metrics based on these lists, in terms of systems over- or under-rating usage, rebounding, scoring, etc.
Next, expanding on the Chris Paul example from above, here are the players that the rating systems are either in greatest agreement on, or in greatest dispute over. To calculate this I just took the standard deviation of the players’ rankings in the seven metrics. All the rating systems agree that Manu Ginobili is pretty good and Acie Law is pretty bad, but they can’t agree on whether Al Jefferson is one of the best or one of the worst players in the league.
Correlating the Rating Systems
One thing that stands out on the last chart is that some of the metrics seem to group together in their evaluation of players. Net Plus/Minus and Adjusted Plus/Minus both rated Casey Jacobsen much better than the other five metrics, and they both rated Al Jefferson much worse. To quantify how much each player rating is in agreement with each other rating, I calculated the correlation coefficient between each metric for all players who played at least 500 minutes last season (here again it should be noted that Adjusted Plus/Minus was calculated using season totals even for players who changed teams during the season, unlike the other metrics which split things up).
Here we can see that Net +/- and Adjusted +/- are similar to one another (correlation of 0.73) but very different from the boxscore-based ratings. Statistical +/-, which is meant to be a boxscore-based estimation of Adjusted +/-, does estimate it better than the other boxscore metrics with a correlation of 0.49, but also correlates pretty strongly with those boxscore metrics. WARP is somewhat surprisingly very highly correlated with PER (0.93), perhaps due to the weight both place on usage.
I’ll try to put together a spreadsheet so anyone can download all this data soon. Until then I’d be interest to hear any interpretations of these charts that people have.