There have been a number of small multiple designs (see the list at the bottom of this post) in the past year or so and I am always a big fan of them, I figured why not take a chance at building one myself. If you have read any of my previous blogs, you may have noticed that I like the word “dynamic”, so I tried to figure out a way to incorporate a dynamic aspect into this visualization based on player data from the last six NBA seasons. Dynamic? One of the tricks with small multiple design is the fact that you have to lay out the graphs in a trellis panel (aka a grid). This can be accomplished by hard coding and sorting the partitioning dimension of your analysis, however, I challenged myself to calculate the location for each graph within Tableau on the fly. The reason why I wanted to do this was to be able to update, change between seasons, etc. without having to do any additional work on the trellis layout of the small multiple viz.
The Concept With a number of small multiples already out there, I wanted to take a look at something slightly different. My thought was to look at the plus/minus statistic and visualize it against team win/loss. You would expect that teams with more wins will have players with higher (positive) plus/minus stats (like Stephen Curry) and those with more losses will have players with low (negative) plus/minus (like Nik Stauskas). I figured this would be the case, but also thought it would be interesting if we found someone (like Austin Rivers) on a good team with a negative plus/minus (or vice versa).
The Data I obtained the data from stats.nba.com. This site is an amazing resource for NBA data, here I am only using the player logs, but there is so much more available for us to visualize. To find the player logs I am using, navigate from the page to “player stats” and then “player game logs”. The data (screen shot from stats.nba.com below) looks like this (every player’s stats from every game).
I did a little chrome browser trick to grab the json file behind this page and then just manually copy and pasted that into a text file and manipulated it a bit (cleaned header, find/replace some json formatting, etc.). I did this for each of the last six seasons and then used SQL to combine the files and also normalize the data (in order to account for games where a specific player may have sat out due to injury/illness, suspension or just rest. My result set actually has a record for every player and every game regardless of whether they actually played in the game (feedback/advice I got from Shine Pulikathara). This additional work allowed me to keep players and cumulative games in line with one another (even if a player missed 20 games due to injury). Using SQL to combine and normalize the data allows me to update the workbook in about 15 minutes when I have some extra time to do so throughout the season.
The Viz I wanted to build a cumulative total of plus/minus over the season on top of a whisker plot (which we have seen a number of places, a couple are referenced below). This would allow the reader to roughly compare the team’s win/loss to its player’s plus/minus. The dashboard view I ended up publishing is a single sheet, which is fed the data shown above after the aforementioned SQL manipulations. The team’s placement and order in the trellis panel is dynamically defined in Tableau via the following calculations.
Teams are ordered based on their win/loss record from top left to bottom right of the panel, this done via a sort of the team on the detail shelf by descending win/loss record.
Since we know that teams are in order of their records, the next thing I needed was an index (or rank really) value for each team. Best team (i.e., Warriors) has the value 1; worst team (i.e., 76ers) has the value 30. As much as I would like to tell you I came up with this equation on my own, I leveraged a trick I saw in one of Joe Mako’s workbooks online (no surprise there). Since my data is ordered, I can start at the top of my table with the value 1 and then add to it each time I hit the next team.
Now that I have my team number (from 1 – 30 based on win/loss record) I just need to place the teams on rows and columns based on this value. One caveat to my “dynamic” word in this post, I hardcoded the fact that I wanted 5 teams on each row, this could ultimately be calculated or parameter driven as well, but I took the easy way out here.
For columns, I did modulo 5 (since I wanted 5 teams per row) and then set the result to 5 when it returned 0. Modulo returns results as shown here…
- 1%5 = 1
- 2%5 = 2
- 3%5 = 3
- 4%5 = 4
- 5%5 = 0 (I reset this to 5)
- 6%5 = 1
- Rinse and repeat
For rows, I needed to mimic a round down function and I did this by checking the value of the team divided by 5 (since I wanted 5 per row) as a fraction against the same value rounded, adjusting the result down when needed.
- 6/5 = 1.2 (1 rounded)
- 7/5 = 1.4 (1 rounded)
- 8/5 = 1.6 (2 rounded, reset to 1)
- 9/5 = 1.8 (2 rounded, reset to 1)
- 10/5 =2.0 (2 rounded, reset to 1)
The last piece involved is to adjust the table calculation for team number and trellis row/column to compute off of the right combination of fields. When looking only at teams it can just work off of the team field. However when adding players and games into the sheet we have to augment the table calculation accordingly. Here is how the dynamic trellis looks in raw form before we add the line and whisker visualizations to the sheet.
From here, I built the rest of the visualization, which you can dissect from the workbook embedded below. It was a little tricky to get both the line and whisker on top of one another in a single sheet. Since they are both based on number of games in the selected season (x-axis) it was something that I was ultimately able to build in. A couple pieces that I added allow the user to select from a number of the player stats on stats.nba.com (e.g., plus/minus, 3-pointers made, turnovers, etc.), color by different aggregations of the data and change seasons. Standard parameter coupled with a calculated field method was used for the dynamic stats and coloring.
Lastly I added a second screen, which removes the small multiple and visualizes all players for the year on the same graph. I thought it would be interesting to look by team as well as all players at once. Here is an example of what that view looks like for 3-pointers made this season (through 2/28/2016).
As I mentioned before, here is a list of small multiples that I referenced and/or was inspired by when making this viz, this is by no means an exhaustive list, just those that I looked at.
- NFL 2015 Regular Season by Shine Pulikathara
- The History of the NFL by Matt Chambers
- Ballcode 2.0 by Peter Gilks
- Minute by Minute Point Differentials of 2015 NBA Games by Adam Pearce
- Edward Tufte's Beautiful Evidence p. 50-55 (excerpt here)
And, here is the final product, enjoy and GO WARRIORS!