Pretty Pictures from Google Spreadsheets
Alex from the blog Mistake by the Lake recently had a post that utilized some of the tips I posted for scraping HotSpots/HotZones data in combination with some neat Google Spreadsheets features to display some charts about Larry Hughes that automatically update throughout the season. I thought this was a great idea, and I didn’t know it was even possible. In this post I’m going to create some charts of my own and explain how anyone can use these very cool features from Google Spreadsheets.
Google Spreadsheets is basically a spreadsheet program like Excel that’s web-based. It’s slower than Excel and lacks many of its advanced features and functions, but one of its advantages is that it allows you to post spreadsheets for anyone to view free of charge (you don’t need your own website). It doesn’t work well with Opera, my browser of choice, but it seems fine in Internet Explorer and Firefox.
One great new feature that it has is the ability to import external data from elsewhere on the web into a spreadsheet (the functions for doing this are detailed here). Once you set up your spreadsheet using these functions to pull data, the spreadsheet will automatically update when the data source changes (I believe it checks for updates once every hour). This feature is great for collecting and presenting data in new ways that aren’t available elsewhere on the web.
One of the first things I posted on this blog were the details for how to access NBA.com’s HotZones data (now called HotSpots). Alex’s idea was to take this data and have it automatically imported in a Google Spreadsheet. Once it’s in a spreadsheet, there’s a lot you can do with it, such as combining the different zones by distance or side. Google Spreadsheets also has the ability to make charts from data; these images can be put directly into a blog post, and they will automatically update throughout the season.
I decided to look at the 07-08 HotSpots data by distance for each team. First I used the importData function to import the HotSpots .JSP text files for each team into the spreadsheet (I’ve split the function into multiple lines so that it doesn’t get cut-off):
=importData( "http://www.nba.com/hotzones/shotChart.jsp? teamcode=hawks&playercode=all&gameType=22007&splitType=&league=0")
Then I used a variety of text functions to split up the strings from the .JSP files into separate columns for each zone. Once this was done I could group the different zones together and calculate things like 16-24 ft FGM and FGA.
In addition to HotSpots data, I used another external data function (ImportHtml) to grab each team’s shooting fouls drawn from 82games (the ImportHTML function is great for grabbing tables from webpages). And I again used the importData function coupled with some text functions to get each team’s FTM and FTA from Doug’s Stats (this was actually kind of tricky given the formatting, but the advantage of getting the data from Doug’s site was that just one import function was required, which is important because Google limits the number allowed in each spreadsheet).
=index(ImportHtml("http://www.82games.com/0708/0708ATL3.HTM","table",7),4,7)
=importData("http://www.dougstats.com/07-08RD.Team.txt")
Once I had all that automatically updating 07-08 data in the spreadsheet, I decided to make some charts. First, I looked at where teams shoot from - what percentage of their shots have been taken from each distance. I used the 0-8 ft, 8-16 ft, 16-24 ft, and 24+ ft designations from HotSpots, and I added in the Shooting Fouls Drawn data from 82games to get a read on how many of each team’s shots resulted in trips to the foul line (this split isn’t perfect because and-one’s are double-counted as both HotSpots field-goal attempts and 82games shooting fouls drawn). Google Spreadsheet’s charting capabilities are pretty rudimentary, so ignore the fact that the chart seems to go up to 120%.
Next, I looked at how well teams are shooting from various distances (I had to make this one pretty tall for all the bars to be visible).
Finally, I combined where teams took their shots from with how well they shot to look at points scored by distance. I used total free-throws made because I didn’t know how many free-throws teams got or made from each shooting foul they drew, so free-throws from personal fouls and technical fouls are included. And obviously, made shots from 24+ ft were credited for three points while closer shots were credited for two.
If you’d prefer to see all the data in table form, you can view the spreadsheet here (the teams are listed in reverse alphabetical order because there’s a bug in Google Spreadsheets whereby that’s the only way to get the charts to present them in alphabetical order). Here you can quickly see how teams compare without having to go through each team’s HotSpots page separately, total up various zones, and keep track of all those totals (and of course start from scratch to repeat the process later in the season when more games have been played). And once again, all these chart images and the tables I linked to will automatically change throughout the season to reflect the latest data.
This is really just a start - there are lots of other interesting things that can be done using the external data functions of Google Spreadsheets. There are many more places one could compile data from, and there are a variety of ways to present the data (Google Spreadsheets has more charting options than just the bar charts that I used). One of the next things I might try to do is make the same HotSpots charts but for team defenses (though the limit on imports in a single spreadsheet might be an issue there considering the steps involved in compliling HotSpots defensive data). If anyone else comes up with a neat use of these features I’d love to hear about it either in the comments or by email. And thanks again to Alex for the inspiration.
Thanks for the spreadsheet. I was interested in seeing who uses the mid-range game the most / least so I combined 8-16 ft with 16-24. Golden St. uses it the least / Detroit the most. Boston below average, Dallas above. Spurs and Suns below but only by a little.
Comment by M — January 19, 2008
I did some quick correlations of FGA by distance to FG% by distance to see whether teams are “optimally” distributing their shots (i.e. the best three-point shooting teams take the most threes, etc.). Obviously it’s a lot more complicated than that, since if you believe in a usage/efficiency trade-off then you might think that a team that rarely shoots threes might shoot a higher percentage because they only are taking the most wide-open attempts. And of course it would be better to look at more than just half a season of data to draw any conclusions. That said, here are the correlations at this point in the season:
So it does look like to some extent, better three-point shooting teams have taken more threes, and worse three-point shooting teams have taken less. But the correlations drop as you go closer to the basket, to the point where there’s a negative correlation between how often a team draws shooting fouls and how well they shoot free throws. One possible explanation for that could be that big men tend to be worse foul shooters than guards, and the teams that get to the foul line more do so mainly because their big men draw more fouls than other teams’ big men. And one explanation for why in general none of these correlations are that high is that as much as every offense is trying to score efficiently, the defense they’re facing is trying to prevent them from doing so, often by forcing the offense to go away from its strengths and take low percentage shots.
Comment by Eli — January 19, 2008
Good stuff.
Comment by M — January 23, 2008