Web Analytics Blogs

Eric T. Peterson has been working in web analytics for over ten years and has built up an incredibly rich body of knowledge about the subject, knowledge Mr. Peterson works to share every week here in his Web Analytics Demystified weblog. Whether you're new to the subject or the most experienced practitioner, you should join the thousands of people around the globe already subscribing to Peterson's blog and start reading today.

Subscribe to Eric T. Peterson's weblog

Section Header for Averages

While averages are conveniently generated for a number of important metrics it pays to keep the definition of an average in mind when using the following key performance indicators. The average, or arithmetic mean, according to the wikipedia is as follows:

The arithmetic mean is the standard “average”, often simply called the “mean”. It is used for many purposes and may be abused by using it to describe skewed distributions, with highly misleading results. A classic example is average income. The arithmetic mean may be used to imply that most people’s incomes are higher than is in fact the case. When presented with an “average” one may be led to believe that most people’s incomes are near this number. This “average” (arithmetic mean) income is higher than most people’s incomes, because high income outliers skew the result higher (in contrast, the median income “resists” such skew). However, this “average” says nothing about the number of people near the median income (nor does it say anything about the modal income that most people are near). Nevertheless, because one might carelessly relate “average” and “most people” one might incorrectly assume that most people’s incomes would be higher (nearer this inflated “average”) than they are. Consider the scores {1, 2, 2, 2, 3, 9}. The arithmetic mean is 3.17, but five out of six scores are below this!

(From http://en.wikipedia.org/wiki/Average.) The important thing to keep in mind when using average-based key performance indicators is that, as the wikipedia says, skewed distributions can lead to the misleading results. This problem often arises when looking at average time spent on a page—the average time spent looks ridiculously long or short but nothing appears to be wrong with the data. When this happens, either try and calculate the median value (50 percent of the values are above, 50 percent are below) or simply do the best you can.

Another problem with averages is that there is really no such thing as an “average” visit or visitor—every person who comes to your web site will behave slightly differently. Some people argue that using averages to understand how people browse content often leads to misinterpretation but I disagree. Used in the context of the following key performance indicators, thinking about the “average” visit or visitor will help you better understand the lowest common denominator—the habits and behaviors of people who are neither your best nor worst visitors, only those who come in the largest numbers. You don’t necessarily want to make sweeping changes to your site based on the activities of “average” visitors but you want to keep a close eye on what the majority is doing. One thing sophisticated users may want to try to overcome this effect is segmenting your audience in meaningful ways and then building the following KPIs; the segmentation will refine the behaviors measured into groups which you ostensibly understand better, thereby driving more specific actions based on the data.

Post Date:
Wednesday, July 13th, 2005 at 11:24 am
Categories:
Subscribe:
Interact:

Phil Aaronson added the following ...

Probably similarly skewed as time spent is CTR on many types of links. Its not unusual to find the top 1% of clicking visitors accounting for 50% or more of the total click activity on a link. And there’s a massive tail of zero clickers. In other words, its Zipf.

Zipf Still Rules [Bob Wyman]

Which begs the question, how useful IS an average on a Zipf distribution? Seems unstable. One robot or motivated visitor can toss it around.


Add to the Conversation

Your email (required) will not be published.

Please note that contributions are moderated and may take a little while to appear.