Advanced Stats for Basketball

Below are all the posts from May 2008. Click here to view all posts.

May 28, 2008

Comparing Player Ratings

Posted by Eli in Advanced Stats, PER, Plus/Minus

I’m still working on that follow-up post on regression to the mean, but in the meantime I wanted to put up a post comparing various player rating systems. For the most part this will be a subjective rather than objective evaluation of the metrics, along the lines of Dean Oliver’s “laugh test” (as in, “a rating system that thinks Dennis Rodman was better than Michael Jordan doesn’t pass the laugh test”). I think looking at how players are rated differently in various systems can tell us a lot about both those players and those rating systems.

The Player Ratings

I took a look at seven popular player ratings. Two basic linear weights metrics based on boxscore stats - John Hollinger’s Player Efficiency Rating (PER), and Dave Berri’s Wins Produced (WP). Two metrics built on Dean Oliver’s individual offensive and defensive ratings - Justin Kubatko’s Win Shares (WS), and Davis21wylie’s Wins Above Replacement Player (WARP). And three plus/minus metrics based on team point differential while the player is on the court - Roland Beech’s Net Plus/Minus (Net +/-), Dan Rosenbaum’s Adjusted Plus/Minus (Adj +/-), and Dan Rosenbaum’s Statistical Plus/Minus (Stat +/-). For the purposes of comparison I looked at the per-minute (or per-possession) versions of all these metrics (e.g. WP48 instead of WP, WSAA/48 instead of WSAA, WARPr instead of WARP).

(Read More…)

May 22, 2008

Google Charts

Posted by Eli in Other Sites

Jason from BallHype has a new post on Chris Paul with some really nice dynamic charts thanks to the Google Chart API.

May 19, 2008

Regression to the Mean

Posted by Eli in Advanced Stats, Stat Theory

In my next few posts I’m going to cover the topic of regression to the mean, and how it applies to basketball statistics. This is a complex issue and this first post is pretty heavy on the math, but I plan on following it up with more practical examples showing how you can do the calculations in Excel and looking at what the results tell us about different areas of the game.

Most of the equations in this post are not my original work but instead were taken from various sources. I’ve tried to compile them all into one place and in a fairly logical order that can benefit both a newcomer to the topic as well as those with more advanced knowledge looking for a refresher. The main sources are various posts and comments by Tangotiger, MGL, and others on The Book blog, Andy Dolphin’s appendix to The Book, and the Social Research Methods site. Throughout this post I will link to several specific pages that are of relevance. I would also recommend two excellent introductions to regression to the mean by Ed Küpfer and Sal Baxamusa.

True Score Theory

Regression to the mean is rooted in true score theory (aka classical test theory). The basic idea is that a player’s observed performance over some period of time (as measured by a statistic like field-goal percentage) is a function of [1] the player’s true ability or talent in that area and [2] a random error component. It should not be forgotten that this is a simplified model, and it leaves a lot of stuff out (team context, for one).

Observed measure = true ability + random error

A player’s true ability can never be known, it can only be estimated. A player’s observed rate is the typical estimate that is used (i.e. we assume a player with a 40% three-point percentage is a “40% three-point shooter”), but by using regression to the mean we can get a better estimate. This is done by combining what we know about how the individual fares in a particular metric with what we know about how players generally fare in that metric.

The first step is to convert the true score model from the individual level to the group level by looking at the spread (or variance) of the distribution of many players’ stats:

var(obs) = var(true + rand)
...but since the errors are by definition random, they aren't correlated with true ability, so...
var(obs) = var(true) + var(rand)

If you look at the field-goal percentages of a group of players, some of the variation would be from the differing shooting abilities among the players, and some would come from the differing amounts of random luck each player had. As the equation shows, the overall variance (the standard deviation squared) of players’ observed rates is equal to the sum of the variance of their true rates and the variance of the random errors.

(Read More…)