Padawan Dojo: Weighted Averages and Medians in Tableau

Introduction

This is the first of two posts on weighted averages and medians, this one introduces a problem we've seen multiple times where reference lines aren't properly weighted. We need to use a different set of options in Tableau to get the desired results and are helped by an understanding of the different levels of detail that Tableau uses to aggregate measures.

When Reference Lines Go Bad...

Here's a sample of data set of 2077 student test scores:

Screen Shot 2017-06-09 at 1.36.08 PM.png

Here's the average and median of those scores:

Yet when I create a viz based on these scores for each school and add average & median reference lines the numbers don't line up. The average reference line is close, it's got a value of 79.9% vs. original average of 79.8%. However the median reference line is way off, it's got a value of 79.81% vs. 82.7%.

Now if you're like me the first time I ran into this issue you might scratch your head and go back to the old standby of MS Excel. Here I copied the data out of Tableau into Excel and wrote a couple of formulas and got the accurate 79.8% average and 82.7% median:

So the reference lines are broken...or are they? Read on for an explanation of why this is happening and how to fix it.

Introducing Levels of Detail

When we're working with Tableau we are able to do all sorts of aggregations such as medians and averages via point & click and that's fantastic. What isn't always obvious though is exactly what *data* those computations are using to get their results. In Tableau we call that set of data the "context" or "level of detail".

In the case of the first screenshot where we're using AVG(Score) and MEDIAN(Score) for the measures the level of detail is the entire data set. In other words we're using every record of data to calculate the average and median.

In the second screenshot with the median and average reference lines the level of detail is very different...in fact there are two levels of detail that we have told Tableau to use in this view:

1) The viz level of detail or vizLOD is based on the dimensions in the view and that is the School dimension. Therefore the AVG(Score) pill is computing the average score of all the records in each of the 10 schools in this data set. 

2) The level of detail for the reference lines is what gets called the "query results" or "result set" that is the 10 schools. In this case the default reference lines are computing the average and median *of the average score for each school*, so instead of an average or median across the whole data set we're getting the average and median of 10 values. 

When we're working with a tool like MS Excel or many reporting tools this change in level of detail is generally more obvious and we don't run into this kind of problem, it's a case where Tableau has made the tradeoff of hiding some complexity in order to get more speed in interactivity, and that works great right up until the point that it doesn't. Therefore a key to mastering Tableau is to keep in mind the hidden logic and in this case that means always being clear on the level of detail that we want for each of our computations.

Using Plain Language

This is where I always try to articulate what I'm trying to do in plain language, in particular the aggregation, the field(s) involved, and the level(s) of detail. In this case we want to:

Show the average scores for each school as circles and the overall (all population or weighted) average and median scores as reference lines.

Generating the Weighted Average and Weighted Median with a Total Reference Line

With this understanding we can get the desired results in Tableau in a few clicks. The average score per school is already accurate so no worries there. To get the average score weighted for every student we need to change how the reference line is aggregated, and to do that we can change the aggregation from Average to Total:

When using Tableau and seeing Total as a keyword–as in Grand Totals, the TOTAL() table calculation function, or the Total reference line aggregation here–instead of thinking of total as "a kind of sum or addition", think of total as "all". In other words by using the Total aggregation for the average reference line we're telling Tableau to calculate the AVG(Score) across all of the data and ignore the School dimension and that gets us the desired population average.

The Median reference line takes a smidgen more work. With the understanding that we want to get the weighted median across all the data when we try to add a reference line that is a total median there's no median aggregation there in the first place in the Value dropdown. So we can add MEDIAN(Score) as a pill to Detail in a couple of clicks and then add the Total reference line on the Median score and get the accurate result:


A Note on Weighting Aggregates and Table Calculations

In this case the weighted average and median values are being computed on the record-level values. In situations where the measure that you want to get the weighted average or median on is already an aggregation or a table calculation then a different method would be required, likely involving Level of Detail expressions or table calculations. Covering those is beyond the scope of this post, if you have questions on that fplease ask in the comments below.


The Other Weighted Average and Weighted Median

The example we've been using here is one where we have a column of record-level values–the Score measure–and we want to make sure that the average and median are properly weighted based on the underlying data and not at a coarser level. There's another kind of weighted average and weighted median that is common in survey and polling analyses where the survey is sampling part of the population and each response gets assigned a weight to adjust the results to reflect the entire population. In that case case there's a column of values (such as survey responses) and a second column for the weights and the calculations to generate the weighted average and weighted median are different because they need to combine the two measures of values and weights. Steve Wexler has done a great post on computing weighted averages at  Working with Weighted Survey Data. Weighted medians require a separate approach and we'll have a blog on that next week.

Conclusion

When we are working with Tableau and maintain awareness of the level of detail that we are telling Tableau to use as we move pills in and out of the view, add reference lines, write calculated fields, etc. we will increase our speed and accuracy while reducing frustration, and who doesn't want that?

You can view and download the weighted average and median reference lines workbook from Tableau Public.