Advanced Stats for Basketball

February 5, 2008

Diminishing Returns and the Value of Offensive and Defensive Rebounds

Posted by Eli in Advanced Stats, Stat Theory, Studies

There has been a lot of discussion in recent months about the importance of rebounding on the player level. Much of this debate has been in reaction to the high value that Dave Berri’s Wins Produced player rating puts on rebounds. On Berri’s blog there have several posts with long, insightful debates in the comments about the issue (that is, if you ignore the unfortunate mudslinging often directed at those with differing points of view). In particular, I would recommend the comments sections of “The Best One-Two Punch in the Association”, “Chris Paul vs. Deron Williams, Again”, and “How Has Texas Survived the Loss of Kevin Durant?”. There have also been some good debates on the topic in the APBRmetrics threads, “Current season Win Scores/Wins Produced” and “Can some one explain the ‘possession cost’ scheme?”.

These are wide-ranging debates, involving such issues as the relative value of rebounding versus scoring and the apportioning of credit for a defensive stop between the defensive rebounder and his teammates. The issue that I want to pick up on is the extent to which the law of diminishing returns applies to rebounding.

(Read More…)

January 18, 2008

Pretty Pictures from Google Spreadsheets

Posted by Eli in Raw Stats, Site Tips

Alex from the blog Mistake by the Lake recently had a post that utilized some of the tips I posted for scraping HotSpots/HotZones data in combination with some neat Google Spreadsheets features to display some charts about Larry Hughes that automatically update throughout the season. I thought this was a great idea, and I didn’t know it was even possible. In this post I’m going to create some charts of my own and explain how anyone can use these very cool features from Google Spreadsheets.

(Read More…)

January 5, 2008

Salary Data

Posted by Eli in CBA/Salary Cap

Sorry for the lack of posts over the holidays. Things should pick up soon.

I’ve updated the Salary Links page. Here are some tips for where to look for accurate player salaries.

Current season:

  • ESPN Trade Machine - accurate, updated throughout the season, includes contract lengths but no future season salary information (also note that ESPN’s individual player pages also list players’ salaries for the current season)
  • Dallas Morning News - accurate (but rounded), not updated throughout season
  • SI - accurate but only has salaries from contracts signed before the 06-07 season

Future seasons:

  • SI - accurate but only has salaries from contracts signed before the 06-07 season
  • ShamSports - some figures are estimated, includes hard to find info on guarantees and bonuses
  • Storyteller - some figures are estimated, includes hard to find info on guarantees and bonuses
  • HoopsHype - some figures are estimated

Past seasons:

  • SI - accurate but only has 06-07 season
  • Dallas Morning News - accurate but only has 05-06 and 06-07 seasons
  • Patricia Bender - some figures are estimated, has data from 85-86 on
  • USA Today - some figures are estimated, has data from 01-02 on
  • Rodney Fort - some figures are estimated, collected from various sources, has data from as far back as 67-68 (also has team payrolls from as far back as 51-52)

As a last resort, there are a number of connected posters on the RealGM CBA forum that can answer specific questions.

December 17, 2007

Does Good Pitching Beat Good Hitting in Basketball?

Posted by Eli in Advanced Stats, Stat Theory, Studies

It’s taking longer than I anticipated to compile and analyze the context-dependency of various player stats by the method I outlined in my last post, so in the meantime I would like to shift gears and introduce a method that uses team stats to try to understand whether the offensive or defensive team controls various aspects of the game.

There’s an old saying in baseball that “good pitching always beats good hitting.” I want to examine what a claim like this is trying to get at, look at a method that attempts to objectively analyze whether it’s true, and then apply that method to many areas of basketball and see what we can learn.

(Read More…)

December 10, 2007

The Reliability and Context-Dependency of Basic Stats: Methodology

Posted by Eli in Stat Theory, Studies

At that the end of my recent post on evaluating player ratings I said that the next step would be to take a step back from comprehensive ratings and look at how the component stats they are built from change in different contexts. That is what I will begin to look at in this post.

The methodology I’m going to use is pretty complicated, so instead of just presenting the results I’m going to use this post to explain in a step-by-step manner the techniques I plan on using. I’m also going to try to point out what I see as potential problems, but in many ways I’m learning as I’m going so I may miss some things. I’d welcome any critiques or suggestions from anyone who knows what they’re doing (or anyone who pretends to know what they’re doing, like me).

(Read More…)

December 6, 2007

More stats on the way?

Posted by Eli in Raw Stats has an article up on new basketball stats which includes some interesting quotes from Gregg Popovich on how he views stats. This part of the article caught my eye:

The good news is that, just like media options, stats are changing. A sterling example: At, box scores for the 2007-08 season now include plus/minus ratings for each player and a category labeled “BA” for blocks against. Even better news is that deflections and contested shots are being studied this season to see how trackable and reliable they would be, as two more stats worth adding.

It’s great that the league is making advancements in their statistical tracking. A lot of teams have been keeping track of stats like deflections for years, but that stays behind closed doors. When the league gets involved then the public can have access to the data and run with it, as has happened with MLB’s Enhanced Gameday and PITCHf/x data. And any additional defensive statistics would definitely be useful considering the current lack of stats on that side of the ball.

December 4, 2007

Evaluating Player Ratings: Year-to-Year Correlations

Posted by Eli in Advanced Stats, PER, Stat Theory

There has been a lot of debate recently about comprehensive player ratings such as John Hollinger’s PER, Dave Berri’s Wins Produced, and Dan Rosenbaum’s Adjusted Plus/Minus. Is one of these rating systems better than the others? What methods can be used to make such an assessment? One approach is to analyze and critique the theory behind each measure - does the way it was constructed make basketball (and statistical) sense? An alternative approach is to analyze them empirically - what happens when we actually start applying the ratings to players? Dean Oliver, the author of Basketball on Paper, has suggested two such empirical methods by which to evaluate player ratings:

(Read More…)

November 30, 2007

Rebounding and Height

Posted by Eli in Studies

To test out the height ordering measure I came up with, and to try some of the methods described in recent posts on the Sabermetric Research blog, I decided to run some correlations to look at the relationship between a player’s height and his rebounding performance.

For the 2006-07 season, I looked at all players who played at least 200 minutes (which came out to 397 players, counting stints with different teams separately). I chose 200 minutes as the cutoff because the correlations seemed to stabilize at that level (at lower cutoffs the correlations were lower because of fluky low minute guys, and at higher cutoffs the correlations were very similar to what they were at the 200 minute cutoff). The explanatory variables that I used were height (in inches) and height ordering (which is on a 1 to 5 scale, with 1 indicating that the player played all of his minutes as the shortest player on the court for his team). The response variables were defensive rebounding percentage and offensive rebounding percentage. DRB% is an opportunity rate measuring DRB/(DRB opportunities), or more specifically, DRB/(team DRB while the player was on the court + opponent ORB while the player was on the court). ORB% is similar but uses ORB opportunities. The actual formulas, which estimate the on-court part, are as follows:

DRB% = DRB/((5*MIN/tmMIN)*(tmDRB + oppORB))
ORB% = ORB/((5*MIN/tmMIN)*(tmORB + oppDRB))

Last season, among players who played at least 200 minutes, Kevin Garnett led the league in DRB% at 30.7%. Earl Boykins finished last at 5.1% for his stint in Denver. For ORB%, Justin Williams was first at 17.6%, while Keith McLeod was last at 0.3%.

(Read More…)

November 26, 2007

Learning from Sabermetrics

Posted by Eli in Baseball, Books

Statistical analysis of baseball is far more advanced than its basketball counterpart. But we can use that to our advantage by learning from the work done in baseball and applying it to the context of basketball. Of course not everything transfers directly due to the differing natures of the games, but more often than not the ideas, theories and methods used to analyze baseball can be adopted to some use in basketball.

To that end, I’ve been reading a lot of sabermetric work recently, even though I really have no interest in learning in just which base/out states it makes sense to lay down a sacrifice bunt. I’d like to recommend some of the books and websites that I’ve found to be great sources of ideas.

(Read More…)

November 16, 2007

Positions and Height

Posted by Eli in Positions, Studies

Identifying a player’s position is useful for all sorts of statistical analysis of basketball, but unfortunately position in basketball is not nearly as well-defined as position in baseball. The traditional breakdown into point guard (1), shooting guard (2), small forward (3), power forward (4), and center (5) works some of the time, but breaks down at the edges. Some teams’ offensive systems don’t differentiate between the roles for the two wing positions (SG and SF), or between the two post positions (PF and C). Some players play one positional role in their team’s offense yet typically guard an opposing player that plays a different positional role in his offense (e.g. Kirk Hinrich, who plays PG for the Bulls offensively but often defends opposing SGs). Many players play different positions at different times in the same game depending on which teammates they are on the court with. For all these reasons and more, having a list saying Player X is a PG, Player Y is a PF, Player Z is a SF, etc. is bound to be lacking.

How can positions be assigned in a more objective and informative manner?

(Read More…)

November 13, 2007

Some HotZones data to work with

Posted by Eli in Raw Stats

OK, here’s some of the HotZones data I promised. I’ve uploaded the team by team data to Swivel - offensive data from 03-04 to 06-07, and defensive data from 03-04 to 06-07. You should be able to download each as a .CSV file and easily import them into a spreadsheet or database.

I’ll have more analysis later, but for now here’s a quick look at how frequently and how well teams shot from different distances.

League percent of FGA taken from each distance:

         0-8 ft  8-16 ft  16-24 ft  24+ ft
         ------  -------  --------  ------
2003-04   40.6%    18.0%     22.9%   18.5%
2004-05   40.5%    16.4%     23.7%   19.4%
2005-06   41.1%    15.2%     23.7%   20.0%
2006-07   41.3%    14.7%     23.0%   21.0%
-------  ------  -------  --------  ------
  Total   40.9%    16.1%     23.3%   19.7%

League field-goal percentage by distance:

         0-8 ft  8-16 ft  16-24 ft  24+ ft
         ------  -------  --------  ------
2003-04   53.7%    37.7%     38.8%   35.1%
2004-05   54.6%    38.3%     40.0%   36.0%
2005-06   55.3%    39.0%     40.5%   36.2%
2006-07   56.3%    39.6%     40.4%   36.2%
-------  ------  -------  --------  ------
  Total   55.0%    38.6%     39.9%   35.9%

The potential trends that pop out to me are the decline in shots being taken from 8-16 feet and the increase in shots from 0-8 feet and 24+ feet. As far as FG% by distance goes, teams seem to be shooting better from both 0-8 feet and 8-16 feet.

You can see that a lot of shots are taken from 0-8 feet. This is where 82games’ distance breakdowns are useful, as their categories of dunks, tips, and close shots provide divisions within 8 feet (some of the shots they classify as jumpers are also within 8 feet).

Another thing to note is that 24+ feet FG% differs slightly from three-point percentage because the HotZones data excludes shots from beyond half-court.

For reference, here are league-wide boxscore stats by season from Basketball-Reference.

November 12, 2007

Assist Rates

Posted by Eli in Stat Theory

To follow-up on my discussion of rate stats, I’m going to look at how this theoretical foundation can help evaluate passing stats created from the starting point of assists.

The basic assist-related player stats are assists per game and assist-to-turnover ratio. Assists per game is a time-period rate, while assist-to-turnover ratio is an opportunity rate (technically it’s an opportunity ratio of successes/failures, but it can easily be transformed into an opportunity rate of Ast/(Ast + TO)).

A lot of the advanced stats in basketball are simply refinements to traditional stats to remove potential biases. So from Assists/Game we can instead shift to Assists/Minute, which controls for playing time, or to Assists/Team Possession, which controls for pace (and playing time). We can even go one step further and shift to Assists/Team Play, which also controls for offensive rebounding (possessions don’t keep track of the extra plays that result from offensive rebounds).

If we turn to the opportunity rate of Ast/(Ast + TO), a flaw is noticeable. Some turnovers have nothing to do with passing - they may be the result of a player trying to score and traveling or committing an offensive foul. In other words, turnovers are not the corresponding failures to the successes of assists. So to make a better opportunity rate for assists, we first need to determine what constitutes an assist opportunity.

(Read More…)

November 11, 2007

So much for HotZones

Posted by Eli in Raw Stats, Site Tips

Well, wouldn’t you know it, right after I published my long piece on’s HotZones, they decided to change them. The HotZones page is still functional and linked to from, but the latest incarnation is now called NBA Hot Spots. It doesn’t appear to have any new features, but the menus do allow you to access more of the data that the HotZones page didn’t (the 07-08 regular season and preseason, and the 06-07 playoffs). It looks like it’s just a new front-end, so all the tips from my previous post should still work, as the data is still stored in the same place on the server. Hat tip to Hoopinion for noticing the move.

An Introduction to Rate Stats

Posted by Eli in Stat Theory

One traditional way of categorizing sports statistics is to divide them into counting stats and rate stats. A counting stat measures the accumulation of successes (or failures) in some area. Total points, field-goal makes and misses, free-throw makes and misses, assists, rebounds, turnovers, blocks, steals, and fouls are counting stats. A rate stat measures the rate or frequency of the accumulation of successes (or failures). Baseball-Reference has a good summary of the difference in the context of baseball here.

I think it can be useful to split rate stats into two subcategories - opportunity rates and time-period rates.

(Read More…)

November 8, 2007

Hacking HotZones

Posted by Eli in Raw Stats, Site Tips

The folks at the official NBA site added a great feature a few years back called HotZones. It’s not on the level of’s fantastic PITCHf/x data, but it’s useful nonetheless. It consists of season-level shot charts for every player and team, broken down into 14 zones. This is data that wasn’t previously available. ESPN has had game-by-game shot charts with their boxscores since 02-03, but in a basically unusable form. 82games has shooting by distance since 02-03, but in addition to lacking side-to-side splits it groups shots into pretty large distance categories (what they label “Jump” shots includes some shots closer than 8 feet as well as everything beyond 8 feet).

So’s HotZones offer a lot of valuable information, but unfortunately they are also presented in a very difficult to use format. They are embedded in a Flash application, which means easy linking as well as copying and pasting are out of the question. Though you can select any team or player and a variety of splits, there’s no way to see what the league average FG% is for a specific zone, or who the league leaders are. And because of errors in the Flash menus, you can’t even access a lot of the available data (as of this posting, data is present on the server but inaccessible through the Flash menus for players from past seasons who are no longer in the league, the 07-08 regular season, the 06-07 playoffs, and the 03-04 regular season). However, with a bit of digging, I was able to find some ways around these problems. What follows are instructions for how to link to HotZones pages (including those you can’t get to through the menus) and how to download HotZones data in a format that allows for easy manipulation in a spreadsheet.

(Read More…)

« Newer PostsOlder Posts »