Advanced Stats for Basketball

Below are all the posts from the "Stat Theory" category. Click here to view all posts.

May 19, 2008

Regression to the Mean

Posted by Eli in Advanced Stats, Stat Theory

In my next few posts I’m going to cover the topic of regression to the mean, and how it applies to basketball statistics. This is a complex issue and this first post is pretty heavy on the math, but I plan on following it up with more practical examples showing how you can do the calculations in Excel and looking at what the results tell us about different areas of the game.

Most of the equations in this post are not my original work but instead were taken from various sources. I’ve tried to compile them all into one place and in a fairly logical order that can benefit both a newcomer to the topic as well as those with more advanced knowledge looking for a refresher. The main sources are various posts and comments by Tangotiger, MGL, and others on The Book blog, Andy Dolphin’s appendix to The Book, and the Social Research Methods site. Throughout this post I will link to several specific pages that are of relevance. I would also recommend two excellent introductions to regression to the mean by Ed Küpfer and Sal Baxamusa.

True Score Theory

Regression to the mean is rooted in true score theory (aka classical test theory). The basic idea is that a player’s observed performance over some period of time (as measured by a statistic like field-goal percentage) is a function of [1] the player’s true ability or talent in that area and [2] a random error component. It should not be forgotten that this is a simplified model, and it leaves a lot of stuff out (team context, for one).

Observed measure = true ability + random error

A player’s true ability can never be known, it can only be estimated. A player’s observed rate is the typical estimate that is used (i.e. we assume a player with a 40% three-point percentage is a “40% three-point shooter”), but by using regression to the mean we can get a better estimate. This is done by combining what we know about how the individual fares in a particular metric with what we know about how players generally fare in that metric.

The first step is to convert the true score model from the individual level to the group level by looking at the spread (or variance) of the distribution of many players’ stats:

var(obs) = var(true + rand)
...but since the errors are by definition random, they aren't correlated with true ability, so...
var(obs) = var(true) + var(rand)

If you look at the field-goal percentages of a group of players, some of the variation would be from the differing shooting abilities among the players, and some would come from the differing amounts of random luck each player had. As the equation shows, the overall variance (the standard deviation squared) of players’ observed rates is equal to the sum of the variance of their true rates and the variance of the random errors.

(Read More…)

April 3, 2008

Thoughts from Bill James

Posted by Eli in Baseball, Stat Theory

Here’s a long Q and A with Bill James that’s well worth reading. This response in particular caught my eye:

Q: Generally, who should have a larger role in evaluating college and minor league players: scouts or stat guys?

A: Ninety-five percent scouts, five percent stats. The thing is that — with the exception of a very few players like Ryan Braun — college players are so far away from the major leagues that even the best of them will have to improve tremendously in order to survive as major league players — thus, the knowledge of who will improve is vastly more important than the knowledge of who is good. Stats can tell you who is good, but they’re almost 100 percent useless when it comes to who will improve.

In addition to that, college baseball is substantially different from pro baseball, because of the non-wooden bats and because of the scheduling of games. So … you have to pretty much let the scouts do that.

These issues seem to me to be important in basketball as well, and I think they are a good starting point for thinking about the statistical analysis of sports. Taking them in reverse order, here’s one way of framing James’ points:

(Read More…)

March 6, 2008

Diminishing Returns for Scoring - Usage vs. Efficiency

Posted by Eli in Stat Theory, Studies

In the wake of my last few posts on diminishing returns in rebounding, a lot of people have suggested looking at how diminishing returns applies to scoring. This is a more complex issue, but I think some of the same methods can be used to try to understand what’s going on in this part of the game of basketball. For rebounding, we were just looking at the relationship of player rebounding to team rebounding. For scoring, we have to look at the relationship of player efficiency and player usage to team efficiency. Diminishing returns for scoring is really just another way of framing the usage vs. efficiency debate which has been going on in the stats community for years. Does efficiency decrease as usage increases? By how much? What, if any, value should be placed on shot creation? Are coaches using anything near to the optimal strategies in distributing shot attempts among their players? Is Allen Iverson wildly overrated? Was Fred Hoiberg criminally underutilized? The big names in basketball stats like Dean Oliver, Bob Chaikin, John Hollinger, Dan Rosenbaum, and Dave Berri have all staked out positions in this debate. For some background, see here and here and here and here and here and so on and so on. A lot of words have been written on this topic.

The major difficulty in studying the usage vs. efficiency tradeoff is the chicken-and-egg problem - does a positive correlation between usage and efficiency mean that players’ efficiencies aren’t hurt as they attempt to up their usage, or just that in seasons/games/matchups where players are more efficient (for whatever reason) they use more possessions? For instance, if a player is facing a poor defender (which will increase his efficiency) he (or his coach) might increase his usage. But it could be that this positive correlation is drowning out the presence of a real diminishing returns effect. If players go from low-usage, low-efficiency against a good defenders to high-usage, high-efficiency against poor defenders, it still could be the case that if they tried to increase their usage against average defenders their efficiency would decrease. Defender strength is just one of the factors that can cloud things - another confound comes from game-to-game or season-to-season variation in a player’s abilities (e.g. a player being “hot” or having an “off game”, a player being injured or tired, or a player using more possessions as his skills improve from year to year).

By using the method from my last study on diminishing returns for rebounding, it’s possible to largely avoid this chicken-and-egg problem. This method looks at situations in which some or all of the players on the court were forced to increase their usage (relative to their average usage on the season). And on the other side, it looks at lineups in which some or all of the players on the court were forced to decrease their typical usage. By looking at these forced cases the method minimizes the confounds from players increasing or decreasing their usage by choice in favorable situations.

(Read More…)

February 23, 2008

More Diminishing Returns

Posted by Eli in Stat Theory, Studies

Following up on my last post, I’m going to look at the issue of diminishing returns for rebounding from a different angle. The new method I’m going to use has several advantages over the previous one (and some disadvantages). What I like best about it is that it does a great job of presenting the effect of diminishing returns visually, rather than just through a table of numbers.

The approach I will use was first suggested to me by Ben F. from the APBRmetrics forum. But before I got a chance to try it out, another poster, Cherokee_ACB, presented results of his own using a similar method. So this post can be seen as building on the ideas of both of these posters.

Instead of comparing individual players’ rebounding percentages to the rebounding percentages of the lineups they played in, this method takes into account the rebounding of all five players on the court for a team. Instead of just speculating about how well a team would rebound if it put five strong rebounders on the court together (or five poor rebounders), it looks at what has actually happened in such situations in the past.

(Read More…)

February 5, 2008

Diminishing Returns and the Value of Offensive and Defensive Rebounds

Posted by Eli in Advanced Stats, Stat Theory, Studies

There has been a lot of discussion in recent months about the importance of rebounding on the player level. Much of this debate has been in reaction to the high value that Dave Berri’s Wins Produced player rating puts on rebounds. On Berri’s blog there have several posts with long, insightful debates in the comments about the issue (that is, if you ignore the unfortunate mudslinging often directed at those with differing points of view). In particular, I would recommend the comments sections of “The Best One-Two Punch in the Association”, “Chris Paul vs. Deron Williams, Again”, and “How Has Texas Survived the Loss of Kevin Durant?”. There have also been some good debates on the topic in the APBRmetrics threads, “Current season Win Scores/Wins Produced” and “Can some one explain the ‘possession cost’ scheme?”.

These are wide-ranging debates, involving such issues as the relative value of rebounding versus scoring and the apportioning of credit for a defensive stop between the defensive rebounder and his teammates. The issue that I want to pick up on is the extent to which the law of diminishing returns applies to rebounding.

(Read More…)

December 17, 2007

Does Good Pitching Beat Good Hitting in Basketball?

Posted by Eli in Advanced Stats, Stat Theory, Studies

It’s taking longer than I anticipated to compile and analyze the context-dependency of various player stats by the method I outlined in my last post, so in the meantime I would like to shift gears and introduce a method that uses team stats to try to understand whether the offensive or defensive team controls various aspects of the game.

There’s an old saying in baseball that “good pitching always beats good hitting.” I want to examine what a claim like this is trying to get at, look at a method that attempts to objectively analyze whether it’s true, and then apply that method to many areas of basketball and see what we can learn.

(Read More…)

December 10, 2007

The Reliability and Context-Dependency of Basic Stats: Methodology

Posted by Eli in Stat Theory, Studies

At that the end of my recent post on evaluating player ratings I said that the next step would be to take a step back from comprehensive ratings and look at how the component stats they are built from change in different contexts. That is what I will begin to look at in this post.

The methodology I’m going to use is pretty complicated, so instead of just presenting the results I’m going to use this post to explain in a step-by-step manner the techniques I plan on using. I’m also going to try to point out what I see as potential problems, but in many ways I’m learning as I’m going so I may miss some things. I’d welcome any critiques or suggestions from anyone who knows what they’re doing (or anyone who pretends to know what they’re doing, like me).

(Read More…)

December 4, 2007

Evaluating Player Ratings: Year-to-Year Correlations

Posted by Eli in Advanced Stats, PER, Stat Theory

There has been a lot of debate recently about comprehensive player ratings such as John Hollinger’s PER, Dave Berri’s Wins Produced, and Dan Rosenbaum’s Adjusted Plus/Minus. Is one of these rating systems better than the others? What methods can be used to make such an assessment? One approach is to analyze and critique the theory behind each measure - does the way it was constructed make basketball (and statistical) sense? An alternative approach is to analyze them empirically - what happens when we actually start applying the ratings to players? Dean Oliver, the author of Basketball on Paper, has suggested two such empirical methods by which to evaluate player ratings:

(Read More…)

November 12, 2007

Assist Rates

Posted by Eli in Stat Theory

To follow-up on my discussion of rate stats, I’m going to look at how this theoretical foundation can help evaluate passing stats created from the starting point of assists.

The basic assist-related player stats are assists per game and assist-to-turnover ratio. Assists per game is a time-period rate, while assist-to-turnover ratio is an opportunity rate (technically it’s an opportunity ratio of successes/failures, but it can easily be transformed into an opportunity rate of Ast/(Ast + TO)).

A lot of the advanced stats in basketball are simply refinements to traditional stats to remove potential biases. So from Assists/Game we can instead shift to Assists/Minute, which controls for playing time, or to Assists/Team Possession, which controls for pace (and playing time). We can even go one step further and shift to Assists/Team Play, which also controls for offensive rebounding (possessions don’t keep track of the extra plays that result from offensive rebounds).

If we turn to the opportunity rate of Ast/(Ast + TO), a flaw is noticeable. Some turnovers have nothing to do with passing - they may be the result of a player trying to score and traveling or committing an offensive foul. In other words, turnovers are not the corresponding failures to the successes of assists. So to make a better opportunity rate for assists, we first need to determine what constitutes an assist opportunity.

(Read More…)

November 11, 2007

An Introduction to Rate Stats

Posted by Eli in Stat Theory

One traditional way of categorizing sports statistics is to divide them into counting stats and rate stats. A counting stat measures the accumulation of successes (or failures) in some area. Total points, field-goal makes and misses, free-throw makes and misses, assists, rebounds, turnovers, blocks, steals, and fouls are counting stats. A rate stat measures the rate or frequency of the accumulation of successes (or failures). Baseball-Reference has a good summary of the difference in the context of baseball here.

I think it can be useful to split rate stats into two subcategories - opportunity rates and time-period rates.

(Read More…)