Measuring Website Usage With Google Analytics, Part I

Knowing where to get started with reporting website statistics can often provide new webmasters with something of a challenge. In this post, I’ll quickly review the guidance provided by the Central Office of Information on Measuring Website Usage which:

describes a common approach to measuring website traffic [for central government]. This enables departments to answer Parliamentary Questions and Freedom of Information Requests about website usage consistently and reliably

I’ll also start to explore how to generate reports that satisfy those guidelines using Google Analytics.

The proposed metrics “are defined according to industry standards set by the Joint Industry Committee for Web Standards (JICWEBS)” and specify the following minimal level of reporting (Measuring Website Usage – Reporting requirements):

  1. The following web metrics, as defined by the Joint Industry Committee for Web Standards (JICWEBS), must be measured for each and every publicly accessible website operated by an organisation:
    • Unique User/Browsers
    • Page Impressions
    • Visits
    • Visit Duration
  2. Central government departments must measure Unique User/Browsers, Page Impressions, Visits and Visit Duration starting from 1 April 2009 for every website open on 1 April 2010.
  3. Executive agencies and non-departmental public bodies (NDPBs) must measure Unique User/Browsers, Page Impressions, Visits and Visit Duration starting from 1 April 2010 for every website open on 1 April 2011.
  4. The following information must be provided to COI at the end of each quarter:
    • Number of monthly Unique User/Browsers
    • Number of monthly Page Impressions
    • Number of monthly Visits
    • Number of Visits of at least two Page Impressions
    • Total time in seconds for all Visits of at least two Page Impressions
  5. Each report should contain figures for each of the previous three months. This information should be provided in the format shown in the reporting template in Appendix A.COI Website usage reporting template http://coi.gov.uk/guidance.php?page=237
  6. All figures should exclude internal web development activity, performance monitoring, automated broken link detection and other types of non-human activity (e.g. robots and spiders). Further details on what to exclude are found in the Page Impressions section.

So what does Google Analytics offer “out of the box”?

Headline report - Google Analytics

The Visitors Overview repeats these figures and additionally provides an indication of the number of ‘unique’ visitors:

Visitors Overview

At face value then, it would appear that the Google Analytics are providing at least some of the required stats (though we need to clarify that the numbers as recorded by Google Analytics conform to what the COI has in mind for those reports as described in their guidance on the Minimum standard for web metrics!) But what does that guidance relating to “at least two web pages” mean?

To understand the emphasis on “at least two pages”, it’s worth reflecting on the notion of bounces and the bounce rate. Bounce rate refers to the proportion of visitors to a site who only visit one page on a website before leaving that site, and as such tend to leave no meaningful analytics behind.

According to the ClickTale blog (What Google Analytics Can’t Tell You – Part 1), Google Analytics “has no way of knowing how long a bounced visitor, who only visits one page, spent on your website”. That is, it appears that the time spent looking at a page appears not to be based on the difference between the time when a page has fully loaded (and generated a trackable onload event) and its unload event; instead, it is calculated as the time between two loading one page and clicking through to and loading a second page on the sam site.

Which is why the emphasis on collecting stats from at last two pages: given the current crop of analytics tools that struggle to do anything meaningful with single page visits, specifying a two page visit means that not only visits to the site that are likely to be meaningful are reported, but also that the reports are more likely to contain meaningful data too. (There is an obvious problem here: if visitors visit two pages, and quickly click to the second from the first before exiting the site from the second page, the time spent on the second page won’t be captured? See for example Time on Site & Time on Page – Google Analytics metric mystery)

One of the nice things about Google Analytics is that it lets you create custom views, or “segments” of the data in which you can specify things such as the minimum number of pages visited when generating a particular report. In order to do this, you specify an “Advanced Segment”. Here’s what an Advanced Segment for a “minimum of two pages visited report” might look like:

GA Advancd segment - visited at last two pages

Applying this segment to the same data charted above gives these results:

Segmented goog stats

GA segmented view

So for example, in this version of the report we see that the average number of page views and the average time on site has gone up.

Something I don’t think Google Analytics report is the total time on site. Bearing in mind the lack of data regarding the time spent on exit pages, the best we can do is multiply the number of visits by the average time on site to get an estimate of the total time on site.

With just this single advanced segment, a simple calculation, and the out of the can reports from Google Analytics, I think we can deliver on the suggested stats based on a literal reading of the headings, though in a follow up post I’ll check to see if the more detailed spec on the metrics matches the way that Google ANalytics defines its metrics.

PS Unfortunately, the segmented report appears to have lost the number of absolute unique visitors (although I think the recommended report wanted the number of uniques, including bounces, to the site?) Anyway, let’s play: the number of visits gives the upper bound on the number of unique visitors, but can we also estimate the lower bound? One heuristic might be to look at the number of visits and uniques in the original report (176 uniques, 245 visits), see how many visits were lost in discounting the bounces (245-104 = 141), assume these were all unique and subtract these from the original number of uniques (176-141=35). I think this gives the lower bound on uniques as recorded by Google Analytics for non-bouncing visitors?

Google Analytics, Feedburner and Google Reader

Over the last couple of weeks, it seems as if the Goog has been doing a bit of reconciliation on the old analytics front, in particular the ability to track traffic driven back to your website from links contained within a feed published from that site using Feedburner…

The first thing I’d noticed as being different was the appearance Google Analytics tracking codes on Feedburner powered posts that I was reading in Google Reader – opening such a post in a new window seems to display it with a set full blown set of GA tracking attributes. So for example, opening a post from the Feedburnered OUsful.Info feed results in a URI like this:

http://ouseful.wordpress.com/2009/11/18/under-the-radar/?
utm_source=feedburner&utm_medium=feed
&utm_campaign=Feed%3A+ouseful+%28OUseful+Info%29&utm_content=Google+Reader

…and I’m pretty sure I didn’t put those tracking codes in there explicitly…

In “Campaign” Tracking With Google Analytics, I started sketching out how it might be possible to use Google Analytics campaign tracking codes to to track the spread of referrer links to documents or document fragments hosted on WriteToReply or JISCPress, so let’s see how the Feedburner annoations are structured:

  • utm_source=feedburner (that is, the originator of the feed);
  • utm_medium=feed (that is, the means by which the content was transported/syndicated);
  • utm_campaign=Feed: ouseful (OUseful Info) (that is, the name of the Feedburner feed (I think: the feed URL is http://feedburner.com/ouseful), followed by the feed title (OUseful Info);
  • utm_content=Google Reader (that is, the place where I viewed the link).

Compare this with the suggestion I made for annotating WriteToReply links:

  • utm_source=twitter.com (that is, the place a link was ‘launched’);
  • utm_medium=question (that is, the type of slug content used to qualify the link);
  • utm_campaign=jiscri (that is, the consultation document linked to, e.g. for the link <em.http://writetoreply.org/jiscri/2009/03/11/rapid-innovation-projects/);
  • utm_content=slug3 (that is, a unique ID to identify the text used to qualify the syndicated link).

So how can you get Googalytics tracking codes on your Feedburner feeds? Details are still sketchy, (e.g. see the original announcement on the Goole Analytics blog here: An Integration With Feedburner, and the Google AdSense for Feeds blog here: “Afternoon, Frank.” “Hey howdy, George.”) but this Google FAQ post on How do I set up my FeedBurner feed to report feed clicks in Google Analytics?:

If you use Google Analytics to track web site visitors, you can see feed clicks originating from your FeedBurner feed by activating an option on the Analyze tab.

When someone clicks one of your feed items and ends up back on your web site, Google Analytics will track that activity and include it in the “Traffic Sources” section.

The post also tells you where you can set up the tracking details – from the Configure Stats menu option. And selecting that, I can now see why my feed links are annotated as they are:

(I’m not sure how the $distributionEndpoint is treated for none Google properties?)

The Google AdSense for Feeds post suggests that:

By default, these analytics will show up in the “All Traffic Sources” and “Campaigns” views in Google Analytics. You can filter the results just to only the traffic that comes from Google FeedBurner by filtering on “feedburner” on the All Traffic Sources page or “Feed:” on the campaigns view. You can also use these sources in the Advanced Segments views.

which suggests that for sites like JISCPress/WriteToReply that use Google Analytics on the main site and Feedburner for the public/promoted feeds, the Feedburner integration will automatically annotate feed links with tracking codes that can be tracked from the site’s Google Analytics dashboard.

“Campaign” Tracking With Google Analytics

Of the very many things that it’s possible to provide webstats reports about, such as tracking visitors arriving from organisational wbsites, one of the most useful is being able to track how much traffic has been driven back to your website from a particular link – such as a link included in a particular tweet, or in a particular email announcement, and so on.

If a link to a JISCPress document appears on a third party webpage, and somebody clicks on that link and then lands on the corresponding JISCPress page, Google Analytics will capture where that incoming visitor cam from via the Referring Sites report. At the top level this is organised by domain:

Google Analytics - Referring sites

We can then tunnel down to the page level:

More referrers

This is all well and good, but sometime we also might want to know where the person who posted the referring link on their web page got hold of it. Did they capture it from a tweet, for example, or via an email list? When we releas a URI into the wild via some sort of marketing campaign, what sort of life does that URI have, and where will it end up sending traffic back from?

In the Googe Analytics FAQ answer How do I tag my links?, a method is described for adding additional tags to a referral URL (that is, a URL that you publish and/or distribute more widely that refers back to your website) that Google Analytics can use to segment traffic referred from that URL. Five tags are available (as described in Understanding campaign variables: The five dimensions of campaign tracking):

Source: Every referral to a web site has an origin, or source. Examples of sources are the Google search engine, the AOL search engine, the name of a newsletter, or the name of a referring web site.
Medium: The medium helps to qualify the source; together, the source and medium provide specific information about the origin of a referral. For example, in the case of a Google search engine source, the medium might be “cost-per-click”, indicating a sponsored link for which the advertiser paid, or “organic”, indicating a link in the unpaid search engine results. In the case of a newsletter source, examples of medium include “email” and “print”.
Term: The term or keyword is the word or phrase that a user types into a search engine.
Content: The content dimension describes the version of an advertisement on which a visitor clicked. It is used in content-targeted advertising and Content (A/B) Testing to determine which version of an advertisement is most effective at attracting profitable leads.
Campaign: The campaign dimension differentiates product promotions such as “Spring Ski Sale” or slogan campaigns such as “Get Fit For Summer”.

(For an alternative description, see Google Analytics Campaign Tracking Pt. 1: Link Tagging.)

The recommendation is that campaign source, campaign medium, and campaign name should always be used.

Elsewhere, (Library Analytics (Part 7), from which elements of this post have been taken), I considered how these codes might be used to track course referrals to Library resources from a VLE (something I need to revisit, now I’ve had a little more time to consider the possible role(s) of these tracking codes). But it also seems to me to be reasonable to raise a few questions about how we might use these tracking codes in the context of a document on JISCPress or WriteToReply in order to track referrals back to the site from social media campaigns highlighting a particular document or section of a document.

So, what are sensible mappings/interpretations for the campaign variables? Remember, these tracking variables are parameters that we might add to a link that we have posted somewherethat is intended to drive traffic back to the site. The tracking variables are there to allow us to see how different links are performing. Thinking about how we might use these five tracking dimensions, whether or not we use them in the “intended” Google Analytics way, may also provide us with some ideas about how to use links to drive traffic back to our site.

To try and ground the exercise, consider this example: a new document is published on JISCPress and we want to compare how well links posted on Facebook compare with links posted on Twitter for driving traffic back. For tracking to be most effective, we hope that if a link is rebroadcast or shared, the tracking variables are carried along with it. This means that if a link is posted to Twitter, that gets shared onto Facebook and onto a blog, we can look at the traffic that comes back, and from where (via the Referral tracking described at the start of this post), for each of the separately released URIs. A second example might relate to a campaign intended to drive traffic back to a particular section or paragraph of a document. This campaign might involve publishing a link back to the same paragraph in a series of separate posts or status updates, each with a different slug or call to action message. That is, each link+message may be published in the same place (and hence have the same referrer information), but at different times and with different link text, or contextual information. A third example might be where there is more than on link back to the same document on a web page, and we want to track how effective each link is compared to the others?

Here are the supported variables again:

  • source: the obvious thing to use this variable for is the domain or URI of the page where the link is published to. So if we tweet a link, twitter.com might be sensible. If we blog it, actually might be best?
  • medium: this is intended to refer to the sort of link that has generated the traffic, such as a banner ad. In our case, we might clarify the intent with which the link was posted, such as announcement, or question;
  • term: this is an optional parameter, and I’m not sure how it should be used or whether it conflicts with other Google services. If we post something with a hashtag on twitter, or a st of tags on delicious, might we use those tags are terms?
  • content The second optional variable, this is often usd to discern A/B test ads. If we tweet the same link with different call to action/prompting questions, maybe this differential content should be uniquely identified with the content field?
  • campaign: typically used for tracking a promotion or campaign, this field might be used to identify a different document when, for example, a link to the top level JISCPress is referred to in a announcement about a particular document?

So for example, we might have something like:
http://writetoreply.org/?utm_campaign=ukgovurisets &utm_medium=announcement&utm_source=actually
appearing as the link for WriteToReply in an announcment about the hosting of the UK Government URI Sets document.

Or maybe a call to action on twitter relating to a particular part of a document:
What benefits would you like to see from #JISCRI calls? http://writetoreply.org/jiscri/2009/03/11/rapid-innovation-projects/#3?utm_campaign=jiscri &utm_medium=question&term=JISCRI&utm_source=twitter.com&utm_content=slug3

To support the generation of tracking URIs, a URL Generator Tool (like the official Tool: URL Builder) that will accept a tweet, for example, along with a JISCPress/WriteToReply URL and then automatically create tracking variable values might be worth considering?