November 8, 2007

Hacking HotZones

Posted by Eli in Raw Stats, Site Tips

The folks at the official NBA site added a great feature a few years back called HotZones. It’s not on the level of’s fantastic PITCHf/x data, but it’s useful nonetheless. It consists of season-level shot charts for every player and team, broken down into 14 zones. This is data that wasn’t previously available. ESPN has had game-by-game shot charts with their boxscores since 02-03, but in a basically unusable form. 82games has shooting by distance since 02-03, but in addition to lacking side-to-side splits it groups shots into pretty large distance categories (what they label “Jump” shots includes some shots closer than 8 feet as well as everything beyond 8 feet).

So’s HotZones offer a lot of valuable information, but unfortunately they are also presented in a very difficult to use format. They are embedded in a Flash application, which means easy linking as well as copying and pasting are out of the question. Though you can select any team or player and a variety of splits, there’s no way to see what the league average FG% is for a specific zone, or who the league leaders are. And because of errors in the Flash menus, you can’t even access a lot of the available data (as of this posting, data is present on the server but inaccessible through the Flash menus for players from past seasons who are no longer in the league, the 07-08 regular season, the 06-07 playoffs, and the 03-04 regular season). However, with a bit of digging, I was able to find some ways around these problems. What follows are instructions for how to link to HotZones pages (including those you can’t get to through the menus) and how to download HotZones data in a format that allows for easy manipulation in a spreadsheet.

Linking to the HotZones for a specific player or team

It is possible to link to specific HotZones pages by using a URL of this form:

The capital letters (”T”, “N”, “Y” & “S”) should be replaced depending on what you want to link to.

T (team nickname): The team’s nickname, all lowercase. Most are self-explanatory; here’s what to use for those that may be ambiguous - “blazers”, “cavaliers”, “mavericks”, “sixers”, “sonics”, “timberwolves”.

N (player name): The player’s first name, followed by an underscore (”_”), followed by their last name, all lowercase. For the whole team, use “all”.

Y (year code):

22003  = 2003-04 regular season
22004  = 2004-05 regular season
22005  = 2005-06 regular season
22006  = 2006-07 regular season
22007  = 2007-08 regular season
42004  = 2004-05 playoffs all rounds (03-04 playoff data isn't available)
420041 = 2004-05 playoffs round one
420042 = 2004-05 playoffs round two (conference semis)
420043 = 2004-05 playoffs round three (conference finals)
420044 = 2004-05 playoffs round four (NBA finals)
...and so on for other playoff seasons (4, then year, then round)

S (split code): For the complete season/playoffs/series, don’t put anything after “split=”.

H = home
A = away
5 = last five games
10 = last ten games
blazers = vs. Blazers
...and so on for other opponents, using the team's nickname

Here are a few examples:

Kobe Bryant in the first round of the 05-06 playoffs:

The 04-05 Spurs in their regular season matchups against the Mavs:

Downloading HotZones data in text form

Finding URLs to view hidden HotZones pages is useful, but the real fun comes in finding the text-based data files that lie behind the Flash application. Once you do that you can start downloading the data in bulk, importing it into a spreadsheet, and manipulating it to find averages, leaders, and even team defensive stats.

HotZones data is stored in .JSP files, which are simply text files. The URLs to access them follow a pattern similar to the links to the HotZones pages, using the same T, N, Y & S variables. There are three types of files - player listings (players.jsp), shot charts (shotChart.jsp), and zone game-by-game stats (zoneGameStats.jsp).

Player listings: These list the players on a team. As far as I can tell these are only available for the current season (however, this isn’t a big problem since past seasons of players who have changed teams can be accessed without specifying which team they played for, as is detailed below). Note the use of “teamcode” instead of just “team”, and the “&league=0″ tacked on at the end.

Shot charts: These files list the field-goal makes and attempts for each zone. The zones are labeled by the side (left, left-center, center, right-center, right) and distance (0-8 ft, 8-16 ft, 16-24 ft, 24+ ft). These are coded as C_00_08, L_16_24, RC_24_Plus, etc. (the full listing can be found below). After each zone code there is an equal sign, the FGM in that zone, a pipe character (”|”), and the FGA in that zone. The zones are separated by ampersands (”&”). The beginning of the file either lists the team or the player depending on the type of shot chart. There is no indication within the files of the season or of what split (if any) the data represents.

Zone game-by-game stats: These hold the data behind the charts that appear when you click on any zone in a HotZones page. This breaks down the shooting in that zone by game, listing the date, game location and opponent, FGM, FGA, FG%, and PTS. There is no indication within the files of which team, player, zone, and split the data represents. These files are not available for the 03-04 season.

Note the new variable “Z”, which is the zone code:

 L_08_16   = left, 8-16 feet
 L_16_24   = left, 16-24 feet
 L_24_Plus = left, 24 feet to midcourt
LC_16_24   = left-center, 16-24 feet
LC_24_Plus = left-center, 24 feet to midcourt
 C_00_08   = center, 0-8 feet
 C_08_16   = center, 8-16 feet
 C_16_24   = center, 16-24 feet
 C_24_Plus = center, 24 feet to midcourt
RC_16_24   = right-center, 16-24 feet
RC_24_Plus = right-center, 24 feet to midcourt
 R_08_16   = right, 8-16 feet
 R_16_24   = right, 16-24 feet
 R_24_Plus = right, 24 feet to midcourt

So what can you do with this data? A whole lot. Here’s a neat trick to get team defensive zone data (which can tell you things like the 03-04 Spurs’ field-goal percentage allowed on shots from 0-8 feet, or how many shot attempts from 0-8 feet the 05-06 Rockets allowed).

There are two ways to get at defensive data. For the first method, pick a season and start with any team (e.g. the Hawks). For that team, download the shot charts for each of the 29 opponent splits (e.g. Hawks vs. Blazers, Hawks vs. Bulls, etc. - remember, nothing in the shotChart.jsp files indicates what split or season the data is for, so be sure to label your files well). Repeat this method for all the other teams in the league. Then if you combine the data by opponent rather than by initial team (e.g. Hawks vs. Blazers, Sixers vs. Blazers, Suns vs. Blazers, etc.), you can find each team’s defensive stats for each zone.

The second method utilizes the zone game-by-game stats. First, pick a season and start with any team. For that team, download all 14 zoneGameStats.jsp files (one for each zone - and remember, these don’t contain the names of the team and zone selected, so label the files with that info). Repeat for all the other teams in the league. Since each file contains the shooting stats game-by-game and lists each opponent, you can combine the data using this info to get team defensive totals for each zone.

For either method, downloading the files and then combining the data by hand takes a lot of work, but this can be avoided by writing a simple Perl script to automate the process.

Getting data from past seasons for players who have changed teams

This can be a little tricky. The Flash menus only list the current season’s roster, even when you select a past season. And the players.jsp files also are just for the current season. The trick to getting at the data through the Flash menus is to not look by the past team, but by the current team. To find Allen Iverson’s stats from his 05-06 season with the Sixers, go to the Nuggets’ 05-06 season and select Iverson. Though it will say Nuggets in the team menu, it will display his stats for the Sixers.

A similar trick works to link to HotZones pages or to find and download the correct data files. Here you can completely ignore the team variable “T”. As long as you have the season and player entered, the same page is linked to (or the same shot chart or zone game-by-game stats file will be downloaded) whether the URL says “team=nuggets”, “team=sixers”, “team=bucks”, or simply “team=” (for the file downloads it would be “teamcode=” rather than “team=”).

Other stuff

In future posts I may post some Perl scripts I’ve created to automate the process of downloading and compiling HotZones data, or I may just post some spreadsheets of already collected data (11/13/07 update - I’ve posted some of the data here). And more interestingly, I will try to analyze some of the data to see what we can learn about different players and teams.

The inspiration for this post came from Joseph Adler’s terrific book, Baseball Hacks. It’s filled with tips and tricks teaching you how to use scripts, spreadsheets, databases, and statistics programs to find, download, compile, and analyze baseball stats from the web. I will probably talk more about it in a future post, but for now I’ll just say that even if you are only interested in basketball rather than baseball stats analysis I would still highly recommend it, as many of the hacks in the book can be applied to hoops.


