Advanced Stats for Basketball

April 3, 2008

Thoughts from Bill James

Posted by Eli in Baseball, Stat Theory

Here’s a long Q and A with Bill James that’s well worth reading. This response in particular caught my eye:

Q: Generally, who should have a larger role in evaluating college and minor league players: scouts or stat guys?

A: Ninety-five percent scouts, five percent stats. The thing is that — with the exception of a very few players like Ryan Braun — college players are so far away from the major leagues that even the best of them will have to improve tremendously in order to survive as major league players — thus, the knowledge of who will improve is vastly more important than the knowledge of who is good. Stats can tell you who is good, but they’re almost 100 percent useless when it comes to who will improve.

In addition to that, college baseball is substantially different from pro baseball, because of the non-wooden bats and because of the scheduling of games. So … you have to pretty much let the scouts do that.

These issues seem to me to be important in basketball as well, and I think they are a good starting point for thinking about the statistical analysis of sports. Taking them in reverse order, here’s one way of framing James’ points:

Player production = player ability + context

James claims that the context of college baseball is so different from the majors that translating from college production to projected pro production by controlling for these context differences is a very difficult task. Behind this claim is a model of player statistical production wherein a player’s stats are the result of his abilities and the context in which he plays.

There’s actually an additional element at play, which is randomness. Tango, MGL and Andy Dolphin have done a fantastic job of exploring this factor in baseball in great detail in “The Book” and on their blog. The basic idea is that, ignoring context for the moment, player production = player ability + randomness. Randomness has a larger effect when sample sizes are smaller and when there is little variation among players in ability. To control for this one can regress statistics to the mean, which is a way of starting with player production and separating it out into ability and luck.

The fuller model would be player production = player ability + context + randomness. Because of the individual nature of the (offensive side of) baseball, the context effects on batting statistics aren’t that complex and are typically controlled for in advance (by adjusting for things like runners on base, pitcher quality and park effects - though James suggests that the context differences between college and the majors aren’t so easy to deal with). However, in basketball, almost all areas of the game are impacted by context in complicated ways as a result of the team nature of the sport. So while it’s important to try to control for randomness in basketball stats, I think understanding the effects of context is the more pressing issue. How will a player’s production change when put in a different role, when playing with different teammates, or when playing in a different coach’s system?

To try to answer these questions and control for context there are a lot of methods that can be used. Measuring statistics per possession rather than per game is a way of controlling for the context of differing tempos. More generally, in a previous post I outlined a way of dealing with the issue by looking at how players stats change when they change teams. I haven’t followed up on that method as promised, in part because I think there may be a better way to approach things by using multilevel modeling (which you can learn about from this book or this article). Eventually I hope to post some results of this approach.

Player ability is not constant over time

As if untangling skill from context wasn’t hard enough, James’ first point emphasizes that skill itself can change in ways that are difficult to predict - some players improve more than others. And James suggests that in baseball, statistics are much more useful for measuring ability than for predicting change in ability.

In basketball I think this is a question for the future, since for now we still have a long way to go on measuring a player’s current skill. But it is important. Why do two players with similar college production go on to have greatly different pro careers? Are we just not looking at the right stats (e.g. maybe there are hidden indicator stats that do shed light on future improvement, such as a high free-throw percentage suggesting the potential for improved three-point shooting)? Or, like James is suggesting for baseball, do we have to look outside on-court stats to try to predict player improvement? This could mean looking to objective (but off-court) measures like age, quickness, strength, vertical leap, and wingspan, or even getting into harder to define areas such as effort, intelligence, diligence, leadership, heart, and other “intangibles.” Or one can take the scouting approach and look to sub-skills players exhibit on the court that aren’t easily quantifiable statistically but that suggest the potential for overall improvement with the right coaching (e.g. a player’s shooting form or how well they box out). At this point I’m not sure we can say just what the right mix is of these varying approaches.


  1. James: “Stats can tell you who is good, but they’re almost 100 percent useless when it comes to who will improve.”

    Meh. James has been pushing this for a while now, but I don’t buy it. He hasn’t shown that stats are useless when it comes to predicting improvement, nor has he shown that scouts can do a better job than a purely statistical forecast. Moreover, since scouts themselves use stats, it’s hard to really find a good point of comparison — if scouts use stats to predict improvement, then wouldn’t a scout with a better grasp of the statistical subtleties involved to even better? How about a scout/geek crimefighting duo?

    Personally, I think James is simply rebelling against the statistical revolution he unleashed. Some stats nerd must have run his dog over or something.

    Comment by edk — April 3, 2008

  2. 100% is surely overstating it, but I agree with him that it’s very difficult to use stats to predict improvement, since basically what you’re trying to do is predict which of two players with the same stats in the past will do better in the future. I’m not familiar enough with sabermetrics to know exactly what progress has been made on those fronts. I do know people have come up with various types of minor league stat translations and general aging curves, but that’s still a far cry from being able to say which of two players who are the same age and who have produced the same minor league stats will end up having the better career.

    As for what scouts can do, I do think there’s value to doing things like looking at swing mechanics to predict which players will best be able to hit big league curve balls, etc. That’s not a purely subjective form of analysis, but I wouldn’t call it scouts using stats either. It falls in the middle ground of using objective measures that aren’t on-field (or on-court) statistics, and I think that kind of approach probably is very useful for trying to predict improvement.

    To go further towards the subjective, if we believe that a player who locks himself in the gym in the offseason will improve more than one who doesn’t, then non-statistical analysis of which players have the most drive or commitment would be very valuable as well. This kind of thing is harder to measure, but it’s worth trying to get at it in some way, whether it be through interviews, background research, psychological testing, Brain Doctoring, or whatever.

    Comment by EliApril 3, 2008

  3. I’d adjust a statement above and instead say for the moment and for discussion… Measuring statistics per possession rather than per game is “a simple way of comparing” play at differing tempos.

    Do different tempos leaguewide produce meanigfully different team and player stats? I havent seen an league wide aggegratation- by team pace band or by actual game pace or player data aggregrated either way. But I’d like to, with at actual game pace being much more labor intensive but much more on point.

    Using a split of teams “by pace band”- 1+ possession above league average (9 teams), middle (14 teams) and 1+ possession below league average (7 teams) I see a pretty significant difference in eFG% between these bands (+1.5% above league averge for the top band and -1% below average for the low pace band) and offensive rating since it is the largest component. On most of the other 4 factors offense and defense the top pace band and the bottom one are more similar to each other than the middle. That doesn’t totally surprise but it needs more investigation. The real study needs to be done based on actual game pace.

    Comment by Mountain — April 3, 2008

  4. Do teams with better shooters play faster (run n gun or just shoot with confidence earlier in the shot clock) and does that cause the FG% spread by pace or does pace contribute to the FG% spread? Or is it some of both? Is pace significant context? At this point I’d guess it is for shooting stats and that GMs and scouts should consider that within the NBA context (and from college level too).

    Comment by Mountain — April 3, 2008

  5. I’ve actually been working on a post about measuring pace. I was delaying it until I got play-by-play data with shot clock info, but that data is proving to be pretty noisy, so I might go ahead and post it without that.

    Comment by EliApril 3, 2008

  6. Look forward to seeing what you do with it.

    I don’t know if it is necessarily to say but while league level data may show a trend one way or another of course some teams will buck the trend. There are many factors and more than one way to succeed.

    Stats simply by pace played is one cut. Trends when a team gets a pace that matches the direction of their average season pace bias or not is another that may help.

    True pace preference may be hard to get at but perhaps their average against weak teams at home in wins would be most suggestive? Just a stray thought.

    Comment by Mountain — April 3, 2008

  7. EFG% in first 10 seconds of shot clock tends to higher than later but this is influenced by fast break opportunities. If you have play by play by shot clock it would be unique and helpful if you could separate fast break from early in shot clock. I can’t see calling a shot after more than 6 seconds a fast break. 4 ot 5 seconds may be a gray area. I wonder how strong early offense is without fastbreaks and if fastbreaks are virtually the entire difference compared to other parts of shot clock.

    I tend to think of pace in game terms but certainly play level can yield additional insights.

    Comment by Mountain — April 4, 2008

  8. This may be overly simplistic, but in general isn’t offensive efficiency well correlated with consistency in pace? I’d imagine that the more poorly a team plays offensively, their likelihood of establishing the tempo for that particular game goes down considerably. Another interesting question would be, is pace more easily established by good offensive efficiency or defensive efficiency?

    Comment by atthehiveApril 4, 2008

  9. The spread of offensive efficency for teams 1+ possessions above league average compared to teams 1+ possessions below league average is nearly 5 points. The same spread on defensive efficency is only 1 point top to bottom but with a 2.5-3 pt dip in the middle.

    The correlations between pace and offensive efficency and pace and eFG% are pretty modest and positive as I suggested previously with the correlation of pace and eFG% is twice as strong as pace and overall offensive efficency.

    The correlation of pace and overall defensive efficeincy is stronger than offensive but the correlation of pace and eFG% allowed is weaker than on the offensive side. On defense defensive rebounding matters a lot.

    “Pace control” remains a topic for a deeper game data study.

    Comment by Mountain — April 5, 2008

  10. In post 3 I said:

    “On most of the other 4 factors offense and defense the top pace band and the bottom one are more similar to each other than the middle.”

    Strike that. Moving quickly I got my columns wrong. Most of them show expected spread top to bottom.

    Comment by Mountain — April 5, 2008

  11. The relationship between pace and efficiency definitely merits further investigation. I’ve done a small study on that but I couldn’t find much of a relationship. I basically was looking at whether for a given game, a team fares better when the pace of the game is closer to its average pace for the season than its opponent’s average pace for the season (controlling for the quality of each team as measured by their season efficiency differential). Like I said, I didn’t find a relationship, but I was using only one season’s worth of game-by-game data, and I was just looking for a league-wide trend rather looking at individual teams separately. Basketball-Reference just added team game logs (going back to 86-87), so now it should be a lot easier to compile the data necessary to do a larger study.

    Comment by EliApril 6, 2008

  12. Sorry for my misstatements but to clean up the mess further …

    I said on post 9 “The spread of offensive efficency for teams 1+ possessions above league average compared to teams 1+ possessions below league average is nearly 5 points. The same spread on defensive efficency is only 1 point top to bottom but with a 2.5-3 pt dip in the middle.”

    Actually the spread is only about 1.5 points on offensive efficiency with the dip in efficiency of several points in the middle there. There are more efficient teams at top and bottom of pace average than middle. hence the pretty low coorelation between pace and efficiency. Pace control is probably involved in some fashion.

    On defense the middle pace teams are indeed the worst on overall defensive efficency by 1.5 to 3pts compared to top and bottom.

    But all this is confusing. Not sure if anyone cares at this point but here is the data (as of a few days ago) which I probably should have posted originally


    Top 9 temas 1+ possession above league pace
    94.91 110.79 51.06 15.64 25.76 23.99

    Middle 7 near league average pace
    90.66 106.04 48.43 15.99 26.34 23.67

    Bottom 14 1+ possessions below league pace
    88.62 109.37 49.48 15.16 27.57 22.66

    League average
    90.98 109.02 49.71 15.50 26.74 23.29


    Top 9 temas 1+ possession above league pace
    94.91 109.94 50.20 15.76 27.56 23.47

    Middle 7 near league average pace
    90.66 110.60 50.13 15.13 26.81 24.60

    Bottom 14 1+ possessions below league pace
    88.62 107.60 49.14 15.51 26.18 22.58

    League average
    90.98 109.00 49.69 15.49 26.74 23.32

    Correlation with pace

    offense 0.15 0.30 0.16 -0.26 0.18
    defense 0.23 0.24 0.19 0.39 0.06

    These correlations are weak. Obviously they should be checked mutliyear before using even a little for real but I am stopping here for now at first cut impressions.

    The team shot charts of high, middle and low pace and high, mid and low efficiency teams and the combinations of these factors might be interesting views and comparions to each other.

    Comment by Mountainm@ — April 6, 2008

  13. Thanks for the numbers, Mountain. One thing that should be done (that I might get to doing myself if time allows) is calculating variance in pace, team by team, and then graphing that versus pace. Intuitively, wouldn’t it make sense that tempo-setting teams (whether high or low) are more efficient? I might be way off the mark with that. However, that would explain the tendency of middle of the pack pace teams’ dropoff in efficiency.

    Eli, is it possible your study didn’t show a relationship because it doesn’t matter whether mediocre teams play at their “average” pace or not? If the variance in mid-tempo team averages is indeed larger, then their individual average paces really offer them no advantage on the court. It’s the pace they end up playing, but not the pace they would ideally play, if that makes sense.

    Comment by atthehiveApril 7, 2008

  14. at the hive, variance in pace vs average pace by team would be interesting next first step.

    Variance at team level between their average pace and difference of actual game pace from “predicted pace” (set as the average of the 2 or weighted more weighly to the better and / or home team or really whatever function best predicts actual pace) would be even better, accounting for the variability among teams in the average pace of their schedules of opponents affected by conference and division and the relation of their pace level to that of others.

    It comes down to investing the time. Maybe later.

    Comment by Mountain — April 7, 2008

  15. Pace is a complex thing. Even if you have a general preference it can change within a game based on what’s working or not in terms of shots, matchups, quality of D, how the refs call it, the way the ball bounces off the rim to the competing rebounders (results a function of positioning and randomness), turnovers, who has lead and by how much, the impact of how what happens in garbage time, etc.

    Pace might be worthy of some attention but it is not a first concern and might not be that high of the list of concerns. But the more you study it the better you can judge how important it is.

    It may matter more / less for particular players.

    Comment by Mountain — April 7, 2008

  16. All the current western conference seeds are high or low on pace (i.e. outside middle third). The story isn’t quite as clearcut in east where 3 teams are barely in the middle third at 18, 19 and 20 on average pace but it is a pretty strong surface case that something about pace is worth understanding better. The odds of this being completely random are pretty small right? Multi-year again would help of course.

    I havent worked up the averages yet but it seems like mid-paced teams are getting pounded this season.

    Comment by Mountain — April 8, 2008

  17. Using W-L profiles at 82games top 10 pace teams beat other top pace teams 50% of time. They beat mid-pace teams 65% of time. They beat low pace teams just 42% of time.

    Mid 10 pace teams beat top pace teams 43% of time. They beat mid-pace teams 52% of time. They beat low pace teams just 25% of time.

    Low 10 pace teams beat top pace teams 54% of time. They beat mid-pace teams 72% of time. They beat low pace teams just 43% of time.

    There are many other cuts you could prepare and look at but this one suggests low pace teams are winning more often against all 3 pace levels though the gap over high pace teams is modest especially against high and low pace teams. Pace control is one thing but ultimately you want pace clash wins.

    Comment by Mountain — April 8, 2008

  18. Yeah, that last concept has intrigued me all season. I’m a New Orleans Hornets fan and the “losing to lower pace” teams more often seemed counterintuitive since the Hornets play among the slowest styles in the league (and one would therefore assume that a slow pace would be a plus). But as you suggest, low pace teams seem to perform far better than their medium and fast paced counterparts, and quite likely, there’s a strength of schedule aspect totally ignored by simple pace vs. pace analysis. Even still, do you think there’s an inherent advantage slow teams have over faster ones just because of the way the game is played today? If there were W-L pace profiles of the early ’90’s, the ’80’s, etc. it would be fascinating to see the way the game has changed.

    Comment by atthehiveApril 8, 2008

  19. Hornets beat low paced teams 52% of time vs 43% for all low pace teams so maybe you could in that light feel better about the near .500 performance because it is above average for the best performing group against this type of opponent They are only 6 better in entire league on this actually- Spurs, Pistons, Celtics, Rockets, Magic and Jazz. But maybe a tie for 7th isn’t enough given current performance and ambitions?

    A schedule heavy of western conference opponents is tougher on offensive and defensive efficency than the east (moreso on offensive) but I havent compared it for just low paced squads.

    I agree long term study would be interesting.

    Comment by Mountain — April 8, 2008

  20. Ok I checked- The 5 low pace teams in west are indeed about 2.5 pts tougher on net efficiency than the east.

    Good luck in the playoffs.

    Comment by Mountain — April 8, 2008

  21. With low correlation of the efficiencies and the 4 factors with pace - and winning and pace too- they may not be much “inherent” that aids low pace teams. Historical study needed before making any firm conclusions.

    1 season could be influenced by the particular mix of philosophy at the time of the league and the quality of the coaching by philosophy. Pace and all the correlations could be further worked with the added variables of coaching experience and win %. I wonder how well GMs follow the relative success of high-low pace teams thru the decades and how quickly and far they go to copy the more successful pace.

    Rules like handcheck enforcement might be playing a role in low pace edge. I could see how that might help patient teams maintain possession and eventually find the shot they like or get fouled but that is just a guess as causation.

    Comment by Mountain — April 8, 2008

  22. guess “on” causation

    The adaptation of coaches or not to pace success trends (if there are any) could also be tracked and the results charted. Does failure to adapt lead to more frequent exits or do those with a firm stance beat the trend chasers? I imagine it is a mixed bag but the details might be interesting. How commonly do coaches successfully change their spots? Riley is one.

    Comment by Mountain — April 8, 2008

  23. Good for people to know.

    Comment by DiantheOctober 22, 2008

  24. Write more, thats all I have to say. Literally, it seems as though you relied on the video to make your point.

    You clearly know what youre talking about, why throw away your intelligence on just posting videos to your blog
    when you could be giving us something enlightening to read?

    Here is my web page :: what are anxiety attacks

    Comment by what are anxiety attacksSeptember 21, 2014

  25. Recent reports have compared the Galaxy S6 Edge to the iPhone 6 Plus, noting while the Galaxy S6 Edge is more expensive for Samsung to make, the iPhone 6 Plus is more expensive for consumers to purchase

    Comment by Chanel iPhone 6 CasesApril 20, 2015

  26. I got this web page from my buddy who informed me
    regarding this web site and at the moment this time I am
    browsing this site and reading very informative articles at this time.

    Look into my blog post: SooTWitaszek

    Comment by SooTWitaszekAugust 24, 2016

RSS feed for comments on this post.

Leave a comment