I am delighted to host a guest blog post by Chris DeMartini on Network Graphing and Hive Plots on the fly in Tableau! From Chris:
First off, want to send a big THANK YOU to Anya for the generosity and help she has provided, really for no reason at all, other than the fact that she is awesome and we are both San Francisco Giants fans.
I was researching network graphing and came across the hive plot and wondered… Can we do this in Tableau? I figured if Noah could build an Enigma machine, we should be able to do this. One challenge I gave myself was to try and create as much data as possible within Tableau, e.g. wanted Tableau to define the node coordinates rather than send the information generated from other tools like NodeXL.
Before going further, my disclaimer should go here, this doesn’t cover all aspects of creating a hive plot, my goal was to try and create a hive plot that could be re-used with some tweaks for various purposes (If you want a full-proof hive plot, you may want to look at using D3 further here or HiveR here).
Ok, disclaimer over, back to the task at hand, a quick note on the data used for the first two vizzes, here is an excerpt of what it looks like, this can be replaced with any standard relationship data where Node A can be related to Node B and Node B related to Node C, etc. You can also augment this more with additional descriptive fields which can drive color and size in the network graph. Notice that there are no (x,y) coordinates in this file.
For the Hive Plots I mimicked the above, however ending up using data from the database of interacting proteins (DIP) here.
Trying to plot nodes dynamically led me to start with a circular network graph as I came across a few good examples of dynamically plotting points in a circle in Tableau. Also I thought it would be good to compare the two separate types of network graphs in the finished product. So I started with Joe’s handoff-map and from there I really just re-purposed the x and y calculations and added some functionality which allows you to plot points in the center of the graph as well as toggle the additional functionality on or off.
To create circular dynamic coordinates for nodes you can follow these steps. I built on these to morph the handoff-map into a larger circular network graph, ordered by degree.
Create PI field, NOTE: If you are using domain completion like I do below, you should use WINDOW_MAX(PI()) to avoid null values.
Create an index field based on points in the window, this is a good trick to keep handy. My data was sorted by degree so this by default is sorted by degree as well.
In Joe’s equation for plotting points in a circle he added a “Spread Rotation” parameter which is a really nice touch that allows you to rotate the nodes. Building on his equation, I added a few parameter based functions:
An on/off toggle to plot points in the center of the circle and
A top N slider to increment/decrement the number of points shown in the center of the circle
Lastly, I ended up making the graph more of an oval by adding a multiplier to the placement of the nodes on the x coordinate.
X and Y coordinates are COS() and SIN() of the above calculated field respectively and that allows you to plot the points in the circle shown below (obviously I have added some formatting to the view we see here).
Next I used a dual axis and put the Node on the path shelf and the relationship ID in the Detail shelf to create the lines between nodes in the circle for each of the relationships. Result of all that looks a little something like this…
And then you can toggle the parameters to see more or less points in the center of the circular network. Twenty five points in the center…
Lastly I added a highlighting functionality which allows you to focus your review on the relationships related to a single node of interest…
The next thing I wanted figure out was curved lines. But in order to do so I needed to have start and end points for the curves. In the first attempt, I decided to take the easy way out on this. I simply leveraged the X and Y axis to plot the nodes along a “line”. For example on the “Single Axis (X)” configuration I am setting Y to 0 and X to Node Number, resulting in all points being plotted on the X axis like so…
We can toggle the axis to plot them all on Y, or we can split the nodes up by a dimension plotting some of them on the X and others on the Y axis. Here is where I landed for my initial test, plotting outward nodes on Y and inward nodes on X. In this version one node can be represented on both the X and the Y axis.
Finally, I can get to my curved lines. I was able to understand (aka copy) the math for the quadratic Bézier curve, so I went that direction. I am sure there are other curve equations that would work, feel free to substitute which ever you see fit. Here is the curve equation I used from wiki…
In order to utilize the above, I had to figure out how to get “t” to go from 0 to 1 by .01. I really didn’t want to create the 101 points per relationship outside of Tableau, so here I leaned heavily on Noah’s domain completion trick. I already had the data in two rows per relationship, as shown in the data excerpt above. This allowed me to leverage the path shelf capability for the circular network. Now I just needed to add a value which would allow me to start at 0 and go to 100, this was done using a calculated field. Next I need to use this field to create 101 points from 0 to 100, this was done by creating a bin with 1 as the “size of bin” and then ensuring that show missing values was selected as described by Noah and shown below.
Now I have a way to generate 101 points from two values (start, end) all generated within Tableau. I then created an index table calculation which restarted on every relationship ID along the bin field created. Dividing this index field by 100 gives us the value of “t” from 0 to 1 by .01. As I have provided the workbook on Tableau Public, I will keep from going deeper into the curve calculations as you can just reference the published workbook.
So I am all set, I am excited, I go to create my curved lines and I see this…
Not quite what I was hoping for, what happened to the points I created? A quick test (moving the bin to the rows shelf) shows that I am only seeing points 0 and 100 (I think it is because I used a new tab). I flip the property of “show missing values” for the bin field again (still in the rows shelf), drag it back to the path shelf and now we are off and running!
From this point I went ahead and created a few variations of this to allow for the user to move the points around to different axes and plot a subset on one axis vs the other. I added some formatting to match the above circular network graph and my “Single Axis (X)” setting now looks like this…
Lastly, the initial two axis view that I started with (out degree nodes are on the Y axis and in degree nodes on the X axis)…
So we have examples of node plotting, we have Tableau calculated curved lines, can we make the hive plot happen? Using the data from the DIP I wanted to test this out. First thing I did was place the nodes on three lines instead of the two axis method described above.
- Out degree only at (0, Node Number)
- In and Out degree at (Node Number, -Node Number)
- In degree only at (-Node Number, – Node Number)
One other note on this step, I did go back into excel to count the nodes in total, but also to count them within their specific degree (e.g. when counting relationships a node may have been 10th overall, but the 1st node Out degree node). This second count was used to group the nodes without any gaps on each line. You could probably also do this in Tableau with Window functions, but excel was easier for me here.
Now we just put curved lines on this and we are good to go right? This is where I ran into the most trouble. I was able to get curved lines, but they were not curving correctly. Here is an example…at least it looks cool?!?!
The main issue for me was dynamically identifying the point P1 in the curve equation (known as the control point). I struggled with this for a while and my search history proves it! Then Anya suggested that we ask Noah for some help. Noah found a bug in my formula and provided some additional guidance that allowed me to get a more meaningful result by playing around with cosine and sine a bit more. This control point is really the key, we already have start and end point coordinates and it is really just a question of how you want the curve to look and this point determines just that. You can make the curve more skewed, more uniform, whatever you desire, you just need to know where (and how) to place this point to make that happen. I ended up using the following method to plot my control points, there may be better ways to do this out there and if you know of any please let me know! Assuming start points (x1,y1) and end points (x3,y3), I find control point (x2,y2) as follows:
Create two calculated fields for DeltaX (x3-x1) and DeltaY (y3-y1)
Calculate the distance of the line from x1,y1 to x3,y3, I then treated this as the diameter of my circle and divided by 2 to get the radius as well.
Find the midpoint of the same line, this will be treated as the center point of the circle. I used (1/2*x1)+(1/2*x3) here.
Calculate the angle of the same line using deltas created above and arc tangent
Then we find the point P1 on the circumference of the circle using the formula midpointx+radius*cos(a) or midpointy+radius*sin(a) for x and y respectively.
Once I had the calculation for the control point working, the curves that were misbehaving before were now looking much better. From here I added some parameter capabilities which complicated my control point formula further, but I took the easy way out on this and just used an IF statement to flip between sine, cosine, etc. as needed. The result looks like this…
If you click through the various species, be patient with the top few Species as they have a large number of relationships and will take some time to come up, towards the bottom of the list (e.g. rat, mouse, mealworm) there are not as many relationships and the hive plot will return more quickly.
Hopefully you find this helpful and one useful way to create network graphs in Tableau. The curved lines used in this example can be repurposed for anything that involves coordinates, for example a map. I have published the workbook within my Tableau Public space here.
Other good resources for network graphing:
Clearly and simply is a really good start on the concept of network graphing in Tableau
Martin Krzywinski’s Hive plots
Anya (@datablick) – Blog space and awesome advice on the viz
Alan (@AllanWalkerIT) – Pointing me to Joe Makko’s handoff-map viz in the Tableau Forum
Wiki – Bézier curve equation
Database of Interacting Proteins (DIP)