There has been a lot of debate recently about comprehensive player ratings such as John Hollinger’s PER, Dave Berri’s Wins Produced, and Dan Rosenbaum’s Adjusted Plus/Minus. Is one of these rating systems better than the others? What methods can be used to make such an assessment? One approach is to analyze and critique the theory behind each measure - does the way it was constructed make basketball (and statistical) sense? An alternative approach is to analyze them empirically - what happens when we actually start applying the ratings to players? Dean Oliver, the author of Basketball on Paper, has suggested two such empirical methods by which to evaluate player ratings:
Method 1: Do the individual player ratings sum to team wins?
If you add up all the ratings of the players on a team in a season (weighted by minutes if it’s a per-minute rating), you should get a figure that is near to the team’s win total or point differential for the season (of course some conversions might be necessary to get things on the right scale). The theoretical assumption behind this method is that if Player A helps his team win more games than Player B, he should rate higher in a player rating system.
Plus/minus (the non-adjusted kind) meets this criterion perfectly, because it in a sense works directly backwards from the starting point of team point differential. Take each player on a team’s total on-court plus/minus for the season (which can be found at 82games or BasketballValue), sum the results, divide by five, and you will get the team’s season point differential exactly.
There has been a lot of discussion about this method of evaluation, especially as it relates to team adjustments (which are add-ons to player ratings that allocate some team statistics to each individual player on the team). I don’t have anything to add to that debate at this time, other than to just point out that there are many ways of constructing player ratings that come close to summing to team wins. For instance, one could take a team’s point differential for the season and assign a fraction of it to each player based simply on how many minutes they played (in this system the player who played the most minutes for the team with the best point differential would be rated as the best player in the league). So clearly some additional test is needed to differentiate between rating systems.
One natural way to expand on this method is to see whether the player ratings from one season can be added together to accurately predict team wins for the next season. But that gets us into the area of Oliver’s second method:
Method 2: Are the player ratings consistent from year-to-year?
The theoretical assumptions behind this method are that player production is similar from season to season, and that a measure that accurately tracks production will likewise be consistent from year to year. Or, to put it in the opposite form, a measure that is not consistent from year to year is not likely to be picking up true, context-independent player skill. I actually do not think this is a very useful method of evaluating player ratings, for a variety of reasons.
Year-to-year consistency is typically measured by looking at the year-to-year correlation of a particular stat. For each player, his rating in year one is compared to his rating in year two. If on average players’ ratings deviate a lot, there will be a low correlation coefficent (r), while if most players second year ratings are close to their first year ratings, there will be a high r. If every player’s rating in year two exactly matches his rating in year one, r will equal 1, while if there is absolutely no connection between the two years’ ratings r will equal 0 (r can also be negative but that’s highly unlikely in the case of year-to-year correlations).
The first problem to note is that r is highly dependent on the number of trials or opportunities each player had for whatever stat you are looking at (a point that Tangotiger and MGL have hammered on over and over and over and over and over in the context of baseball). Randomness tends to even out over the long haul, but over shorter periods of time it can have large effects on correlation coefficients. This is compounded by the fact that players accumulate their opportunities in different stats at different rates (e.g. players take a lot more two-point shots than half-court shots in a season, so a straight comparison of the YTY r’s of those stats is going to be distorted even if one controls for minutes played). I’m going to sidestep how this issue affects player ratings for now, but it is an important consideration.
A general problem with using year-to-year correlations in basketball
In using year-to-year correlations to evaluate a stat, I think what people are trying to get at is how context-dependent that stat is. What percentage of the observed performance (i.e. how the statistic rates the player) comes from the player’s actual skill, the role he plays on his team, the skill of the teammates he plays with, the coaching system he’s playing under, random chance, etc.? A rating like plus/minus might sum perfectly to team point differential on the season, but is this just because it’s picking up the skill of the teammates (and opponents) the player is on the court with? If so, it’s unlikely to be useful in predicting how a player would perform in a different context, and this should be reflected by its having a lower year-to-year correlation, since many players will have their context change significantly between seasons (by changing teams or just by changing the players they play most of their minutes with).
The problem is that year-to-year correlations really only directly tell you about the random chance part of the equation, and not the context parts. They are useful in baseball because context isn’t as big a deal as it is in basketball. Sabermetricians already have context-neutral stats (controlling for park effects, opposing batter/pitcher strength, etc.) to start with, and those are what they plug in at the beginning when looking at year-to-year correlations. They’re not using those correlations to evaluate the usefulness of one stat as opposed to another, but rather to determine what part of the observed performance in a stat is determined by luck. That tells them that one stat might take longer into the season to stabilize, and suggests how far to regress to the mean in making projections.
An alternative method: same-team YTY correlations vs. different-team YTY correlations
So unlike their limited and well-defined use in baseball, using year-to-year correlations in basketball is a pretty crude method that I don’t think gets at what people are hoping for (that is, how context-dependent is this stat?). One way to try to get around this would be to more explicitly tie the correlations to the context changes we are concerned with. Instead of looking at the YTY r for all players in Wins Produced and comparing this to the YTY r for PER, we can first split the players into two groups - those who played on the same team in year one and year two, and those who played on different teams in year one and year two. Basically, we’re trying to control for team context before looking at the YTY correlations. Even if we don’t fully understand what it means if the YTY r of same-team players in Wins Produced is higher than the YTY r of same-team players in PER, instead of comparing these figures directly we can see how much they change for each stat when looking at players who changed teams. In other words, for each stat the YTY r for the same-team players can be used as a baseline to see how much dropoff there is to the YTY r for different-team players.
Here’s an example of this method. I looked at PER, Wins Produced, and Statistical Plus/Minus from the 1979-80 (when the three-pointer was introduced) to 2006-07 seasons (at some point I’d like to also look at raw plus/minus, but I haven’t compiled the data off of 82games, which would only be for the 02-03 to 06-07 seasons). I actually didn’t use true Wins Produced because I couldn’t do the position adjustments, so I used what Berri calls Adjusted P48 (since in calculating YTY correlations I’m comparing players to themselves one year later, I don’t think anything is lost in not first adjusting for position). Statistical Plus/Minus is an approximation of Adjusted Plus/Minus using a pace-adjusted linear-weights formula applied to boxscore stats. For more about it see this thread by Dan Rosenbaum, and this thread in which I described some of the ways in which my calculation of it differs from Rosenbaum’s.
For the different groups, I threw out all player-seasons where a player played for multiple teams. So the same-team group includes all players that played for one team all of season one and the same team all of season two (5082 player season-pairs), and the different-team group includes all players that played for one team all of season one and a different team all of season two (1829 player season-pairs). To control for some of the problems from the different number of opportunities, I’ve presented the results in minutes slices - the 0-499 row is just those players who played between 0 and 499 minutes in year one AND who played between 0 and 499 minutes in year two (the “Plyr” columns indicate that 271 players met this criterion in the same-team group and 311 met it in the different-team group). One downside of slicing things like this is that players whose minutes dramatically increased or decreased between seasons are excluded completely. At the bottom I’ve presented an alternative that just uses a 1000 minute cutoff in each season.
Year-to-year correlations (r), 1979-80 to 2006-07: Stat Stat Same Diff PER PER WP WP +/- +/- MIN range Plyr Plyr Same Diff Same Diff Same Diff --------- ---- ---- ---- ---- ---- ---- ---- ---- 0- 499 271 311 .12 -.03 .10 .12 .14 .03 500- 999 176 70 .55 .57 .69 .69 .55 .68 1000-1499 180 66 .62 .50 .79 .83 .69 .63 1500-1999 254 61 .73 .66 .82 .76 .69 .57 2000-2499 266 52 .77 .66 .86 .80 .77 .60 2500-2999 437 50 .85 .70 .91 .87 .85 .74 3000-3499 136 9 .88 .32 .92 .86 .88 .69 1000+ 3318 698 .84 .68 .86 .79 .81 .65
For players that stayed on the same team, Wins Produced has a slightly higher r than PER and Statistical Plus/Minus. And interestingly, this difference becomes much more pronounced when looking at players that changed teams. The r’s of PER, WP & Stat +/- all drop off for players that changed teams (which we would expect since those players’ contexts have changed a lot more than players who stayed on the same team), but the dropoffs in PER and Statistical Plus/Minus are much greater than in Wins Produced (using the 1000 minute cutoff, PER drops from .84 to .68 and Stat +/- drops from .81 to .65, while WP only drops from .86 to .79). These numbers seem to at least initially suggest that Wins Produced is less context-dependent than PER and Statistical Plus/Minus, and perhaps for this reason it is a more useful player rating (all other things being equal).
A specific problem with using year-to-year correlations with player ratings
Unfortunately, even this method of separating players into groups based on whether they changed teams prior to looking at year-to-year correlations has issues that crop up when using it to evaluate player ratings.
PER, Wins Produced, and Statistical Plus/Minus are all basically just linear weights formulas based on the same boxscore statistics but using different weights. So their YTY r’s are just going to be combinations of the YTY r’s of the various boxscore stats on which they are based. And the differences in their YTY r’s will depend on their different weightings of those boxscore stats.
Here’s an illustration. Say we create a basic pace-adjusted linear weights player rating. To keep things simple I’ll just add up the good stuff and ignore subtracting the bad stuff. Points, rebounds, assists, steals and blocks will all be weighted equally. Call this SIMPLE. I’ll also create two variations - DIMES, which is exactly like SIMPLE but weights assists twice as heavily as all the other stats, and BOARDS, which weights rebounds twice as heavily as all the other stats.
Formulas: SIMPLE = PaceAdj*(PTS + TRB + AST + STL + BLK)/MIN DIMES = PaceAdj*(PTS + TRB + 2*AST + STL + BLK)/MIN BOARDS = PaceAdj*(PTS + 2*TRB + AST + STL + BLK)/MIN Year-to-year correlations (r), 1979-80 to 2006-07: Same Diff SIMP SIMP DIME DIME BRDS BRDS MIN range Plyr Plyr Same Diff Same Diff Same Diff --------- ---- ---- ---- ---- ---- ---- ---- ---- 0- 499 271 311 .42 .08 .44 .07 .41 .22 500- 999 176 70 .76 .67 .76 .71 .82 .72 1000-1499 180 66 .75 .59 .79 .61 .82 .72 1500-1999 254 61 .86 .75 .87 .70 .89 .84 2000-2499 266 52 .85 .77 .87 .73 .90 .86 2500-2999 437 50 .91 .77 .92 .74 .93 .84 3000-3499 136 9 .93 .30 .92 .57 .95 .21 1000+ 3318 698 .90 .76 .90 .75 .91 .83
Looking at these numbers, the first thing that pops out is the .90+ r’s for all three ratings on players that stayed on the same team. This is a nice demonstration of how a high year-to-year correlation isn’t enough on its own to indicate a quality player rating. All the ratings’ correlations drop off for players who changed teams, with DIMES behaving very similarly to SIMPLE. But when we get to BOARDS, we see that it doesn’t drop off nearly as much for players who changed teams (.91 to .83 compared to .90 to .76 and .90 to .75). BOARDS appears to be much less context-dependent than the other two ratings. But does this mean that BOARDS is somehow a better rating system than SIMPLE - does it mean that rebounds really should be weighted more than other boxscore stats? Not really - all it says is that rebounds are less context-dependent than other boxscore stats (MAYBE - I have a feeling Tango’s point about the number of opportunities could be cropping up here), and the player ratings are reflecting this through their different weights.
Thus our conclusions about Wins Produced vs. PER and Statistical Plus/Minus need some revising, especially since Wins Produced is known for heavily weighting rebounds relative to the other boxscore stats. It’s no longer clear just what we can learn about player ratings by looking at their same-team vs. different-team year-to-year correlations. I think the next step is to go back to the basics and look at individual boxscore stats and try to find how context-dependent they are. But this post is long enough already, so that will have to wait until a later date.