Visit durations normally distributed? Not likely.

As developers and marketers, a lot of what we do is aimed at getting people to spend a little longer on the site. In that context average time on site is a vital measure of success. However, the metric is fraught with difficulties in collection and interpretation.

I’m not going to talk here about issues with how the time on site  is calculated. Google themselves and others have this covered. This post by Brian Cray is a good example, with a useful work-around.

What I want to look at is issues in presenting and interpreting the time on site metric.

The biggest problem with average time on site is that it is an average. A lot of the usefulness of measuring visit length is hidden in this single number.

As a simple illustration, suppose your site has had 3 visits (big day!) and an average visit length of 30s. This might have been:

  1. 3 x 30 s visits, or
  2. 1 x 90s visit and 2 x 0s (bounce) visits, or
  3. 2 x 45s visits and 1 x 0s (bounce) visit

Your next move probably depends on which one of these it is, and the average doesn’t tell you.

Also, only in case (1) is any visit actually average. In case (2) most of the visits are below average and in case (3) most of the visits are above average.

Fortunately, Google does present a distribution graph, in the behaviour/engagement reports.

Time on site histogram from Google analytics

Avinash Kaushik explains how this presentation is useful, and so it is. It gives us an idea of the distribution, and for most sites, ignoring 0-10s, at first glance, it looks just like a classic bell curve, ie a normal distribution, which might give us a lot more faith in the average as a meaningful number.

However, this graph should come with a warning – visit length is not generally normally distributed. In Google’s graph, the categories are divided in a way that makes it look like a bell curve (for many sites), but this appearance is actually generated by the uneven sizing of the buckets:

  • 11-30 is 19s,
  • 31-60 is 29s,
  • 61-180 is 119s
  • 181-600 is 419s
  • and so on.

So what is the typical distribution of visit duration?

With the help of the API we can pull out visit counts for evenly sized (30s) buckets, we see…

Visit duration frequency distribution

Not an exact fit to a power law, but very close. Note that this does not include 0 length (bounce) visits. Could this be the negative Weibull distribution that Jakob Nielsen cites for time on page?

Anyway, what difference does this distribution make?

This should make us even more suspicious of the average time on site metric as a useful characterisation of visitor behaviour.

In the example above, less than 1 in 5 visits is anywhere near the average (about 17% within the average ±50%). So the “average” is actually a small minority case.

The average is being pulled in 2 directions in the distribution

  • A large number of low length visits
  • A small number of longer visits (sometimes much longer)

With that in mind, is there any obvious place to draw a line between short, medium and long visits?

Effect of segmentation on the distribution

One of the most powerful techniques for finding actionable information in metrics such as average time on site, is segmentation. However, because of the nature of the distribution, care needs to be taken in comparing average time on site between different segments.

For example, if you segment on an aspects of user behaviour that eliminates bounces from your segment (eg segment visits including an event), you eliminate the 0 length visits at the left end of the distribution. This will probably push the calculated average a long way up (thanks to the shape of the distribution), without any really meaningful difference in user behaviour apart from eliminating the bounces.

In this case, it is likely to be more useful to compare your segment with a segment of non-bounces; or, even better, look at the whole distribution for both segments.

How to look at the visit duration distribution for your site

The graph above was arrived at using the spreadsheet function described in my earlier post to deliver visit counts for a site, for a whole year, using a series of dynamic segments like this: dynamic::ga:timeOnSite>=455;ga:timeOnSite<484

It is manageable to set up the dozens of requests needed, one each row of the spreadsheet, because the cell that defines the segment on each row, is itself the result of a calculation like =concatenate(“dynamic::ga:timeOnSite>=”, A20, “;ga:timeOnSite<“, B20) – so A20 and B20 in the example, simply adds 30s to the previous row.

Once you’ve established the dynamic segments for each 30s time period, you can use the getGAData spreadsheet function to pull whatever metrics you like and see how they are distributed by visit duration – visits, pageviews, goalcompletions, and so on. You can then add spreadsheet functions to look at the distribution of say, goal completions per visit at each visit duration.

Leave a Reply

Your email address will not be published. Required fields are marked *