Here’s a long Q and A with Bill James that’s well worth reading. This response in particular caught my eye:
Q: Generally, who should have a larger role in evaluating college and minor league players: scouts or stat guys?
A: Ninety-five percent scouts, five percent stats. The thing is that — with the exception of a very few players like Ryan Braun — college players are so far away from the major leagues that even the best of them will have to improve tremendously in order to survive as major league players — thus, the knowledge of who will improve is vastly more important than the knowledge of who is good. Stats can tell you who is good, but they’re almost 100 percent useless when it comes to who will improve.
In addition to that, college baseball is substantially different from pro baseball, because of the non-wooden bats and because of the scheduling of games. So … you have to pretty much let the scouts do that.
These issues seem to me to be important in basketball as well, and I think they are a good starting point for thinking about the statistical analysis of sports. Taking them in reverse order, here’s one way of framing James’ points:
Player production = player ability + context
James claims that the context of college baseball is so different from the majors that translating from college production to projected pro production by controlling for these context differences is a very difficult task. Behind this claim is a model of player statistical production wherein a player’s stats are the result of his abilities and the context in which he plays.
There’s actually an additional element at play, which is randomness. Tango, MGL and Andy Dolphin have done a fantastic job of exploring this factor in baseball in great detail in “The Book” and on their blog. The basic idea is that, ignoring context for the moment, player production = player ability + randomness. Randomness has a larger effect when sample sizes are smaller and when there is little variation among players in ability. To control for this one can regress statistics to the mean, which is a way of starting with player production and separating it out into ability and luck.
The fuller model would be player production = player ability + context + randomness. Because of the individual nature of the (offensive side of) baseball, the context effects on batting statistics aren’t that complex and are typically controlled for in advance (by adjusting for things like runners on base, pitcher quality and park effects - though James suggests that the context differences between college and the majors aren’t so easy to deal with). However, in basketball, almost all areas of the game are impacted by context in complicated ways as a result of the team nature of the sport. So while it’s important to try to control for randomness in basketball stats, I think understanding the effects of context is the more pressing issue. How will a player’s production change when put in a different role, when playing with different teammates, or when playing in a different coach’s system?
To try to answer these questions and control for context there are a lot of methods that can be used. Measuring statistics per possession rather than per game is a way of controlling for the context of differing tempos. More generally, in a previous post I outlined a way of dealing with the issue by looking at how players stats change when they change teams. I haven’t followed up on that method as promised, in part because I think there may be a better way to approach things by using multilevel modeling (which you can learn about from this book or this article). Eventually I hope to post some results of this approach.
Player ability is not constant over time
As if untangling skill from context wasn’t hard enough, James’ first point emphasizes that skill itself can change in ways that are difficult to predict - some players improve more than others. And James suggests that in baseball, statistics are much more useful for measuring ability than for predicting change in ability.
In basketball I think this is a question for the future, since for now we still have a long way to go on measuring a player’s current skill. But it is important. Why do two players with similar college production go on to have greatly different pro careers? Are we just not looking at the right stats (e.g. maybe there are hidden indicator stats that do shed light on future improvement, such as a high free-throw percentage suggesting the potential for improved three-point shooting)? Or, like James is suggesting for baseball, do we have to look outside on-court stats to try to predict player improvement? This could mean looking to objective (but off-court) measures like age, quickness, strength, vertical leap, and wingspan, or even getting into harder to define areas such as effort, intelligence, diligence, leadership, heart, and other “intangibles.” Or one can take the scouting approach and look to sub-skills players exhibit on the court that aren’t easily quantifiable statistically but that suggest the potential for overall improvement with the right coaching (e.g. a player’s shooting form or how well they box out). At this point I’m not sure we can say just what the right mix is of these varying approaches.