Website Optimization and Data Analytics
There is a lot of talk, obviously, about “Search Engine Optimization” and “Web Analytics” – yet it sometimes seems that there is not a lot of clear thinking about what is being “optimized” or how to go about it.
From my observation, “Web Analytics” (WA) does not appear to have taken full advantage of the analytics toolkit : as it currently stands, it appears to me to be mostly about counting things and doing very little analysis thereof.. yes, you can get server logfile entries, yes you can get a bit more info from injecting cookies into each of the pages, and using CGI strings … but it is all a bit “univariate”.
Have a wander around Wikipedia’s entry on Web Analytics and you will see what I mean.
Web Analytics too often means “Web Metrics” .. counting things, and leaving them flat.
Not much rocket science or pattern recognition there, although I do have a feeling that something more could be done with a better semantic analysis of the referrer strings which – in the case of a visit that comes by clicking on a link from a search engine will include the text of the search term.
And I guess it is not surprising that the analysis tends to be thin when the data is so thin : HTTP is a stateless protocol, and pretty much anonymous .. so you don’t really know very much about your visitors and their visits.
Of course that is a “post hoc” view – a view that says that analytics starts with the data recorded at the site.
Perhaps web analytics would be richer if it started earlier in the process, perhaps using some of the tools of text mining.
It is a big enough phenomenon – so, potentially lots of data, no?
But, the web is obviously big business and it is very important to some companies to get it right.
Obviously.
It is a bit of a pity that WA and SEO (Search Engine Optimization) have a tendency (imho) to treat all sites as equal, the very small local area focused (think Dog Boarding Kennels), the very large corporate site and the professional services site as all pretty much the same thing i.e. “web sites”.
We “know” that there are some 3 million web sites as of June 2002, a figure smaller than one might have expected .
”Trends in the Evolution of the Public Web” estimated about 3 million sites and 1.4 billion pages (and a declining growth rate in sites, although no information is given about page growth rates) : the methodology for estimating the number of web sites via random sampling of IP address seems sensible enough.
A Google search for the word “the” gives a page estimate of 14 billion, a search “the OR of OR and OR a OR to OR in OR is “ (the most frequent 7 words in English) gives around 24 billion pages.
A lot of rubberiness in the figures here, as expected.
So, lots of potential data points there. Many times more pages than sites and since search engines (google anyway) indexes pages rather than sites, we might as well redefine the problem to one of optimizing webpages.
But this is very broad brush.
We might apply a bit of commonsense segmentation here and get some more insights into the issues : there are obvious differences between “credentials sites” (eg, for professional services), “information sites” (to take the load off customer service, perhaps) and “retail” sites (with or without on-line transactions).
But what is it that we are optimizing?
Optimization generally implies some tradeoff or constraints .. much of the discourse around “web optimization” is really about unbounded maximization.
Usually maximization of visits or visitors.. for example, Wikipedia says about Organic Search
The field of search engine optimization, (SEO), is concerned with maximizing the visibility of a web site by making its listings appear more frequently and more prominently in organic search results. Some businesses base their SEO strategies on the successful insertion of their web site listing(s) into organic search results for various popular keywords.
Search engine optimization (SEO) is a set of methods aimed at improving the ranking of a website in search engine listings, and could be considered a subset of search engine marketing
OK, that is maximization mostly.
It could start to be considered optimization, perhaps, if there was some set of constraints (eg on an advertising budget, or the total quantity of keywords, or a budget for content building, or time) or some tradeoffs (more emphasis on these keywords means lower search result positions (below the fold in a SERP – Search Engine Results Page – and therefore believed to be of lower utility).
Let us note in passing that there is a hidden piece of context, of conditionality here. Since the website owner does not know which queries potential visitors could use (and can never know this even by studying the referral strings on the visitation logs), then optimization of the above sort can only be done with respect to a query which is unknown or is cavalierly put forward as being “representative”. This can of worms cries out for some semantic analysis.
Real World Experimentation to Maximize Hits is too slow
Well, we can maybe maximize “hits” .. but there is a problem lurking here. We won’t know the effects of any changes we make in pursuit of this goal until some (substantial) time later, the time that google takes to update our site index (and sometime later on, the published PageRank). And those changes may be insignificant in their effect compared to, for example, the effect of incoming links (and even changes in their content, their PR).
So, this dataset .. the data that results from self initiated design changes .. is likely to be small in number, slow in forthcoming, very noisy.
Umm, what about the idea that we had millions of websites and billions of pages? All of a sudden we are down to a small dataset?
Well, yes. If we insist on doing natural experiments… a sort of poke it and see what happens approach.
Maybe we can use the web itself as a natural laboratory. Yes, good idea. We might finally get somewhere with this. We know what the google rank is for any page, we can experiment and find its SERP position vis a vis a given query.. we can start to build a model.
What about maximizing profits?
OK. There are a couple of serious issues
- We need a revenue model. Given x visitors, what is the revenue per visitor?. This is none too obvious, and it is not clear that we can justify taking a simple average of past sales per thousand visits and extrapolating it to new traffic. And for sites generating both support and sales traffic, the calculations could be problematic. However, possibly doable..
- We need a costs model. With luck this is just the cost of on-line advertising which we know and can (partially) control and indeed can experiment with
- We need a SERP and traffic model .. given x content (keywords, simplistically) what will be the expected traffic? and how does this vary with assumptions about the queries employed?
- We need a links influence model .. given N incoming links, of average rank x.. what will be the influence on the SERP position
Lots of interesting data analytic work there.
Site design and Visitor Retention
The search engine is not the (only) client.
Somewhat informally, it is obviously not a good idea for a qualified prospect to come to your site and then click away or go no further.
There is a fair amount of experimental design work one can do to address this sub objective .. maximizing the probability of a visitor staying on site until she finds what she was looking for.
We are obviously straying into the area of usability, and this may not be high payoff.
However, to generate useful data, think DHTML for alternate content presentations, CSS switching, possibly AJAX for continuous invisible experimentation, exit surveys..
Another take on this- “Metrics Identify Problems, Not Solve Them”
Max Blumberg, of Max Blumberg’s Positioning Game had this to say
I’d like to suggest that the metrics available today are still in themselves insufficient to guide Web site designers into redesigning the site so as to meet the needs of different customer segments — i.e., just because I know that focus and yield, etc. are low in this section of my site is not sufficient to tell me “how” to fix it. However, this is not to say that being guided by any metric is better than having none at all, but I’d say we are a long way from being able to translate customer Web site behavior into useful metrics that can lead us to make the required changes to our Web site. I believe that today’s metrics can help only those who already have domain expertise in this regard
His suggestion? Apply Structural Equation Models (SEM) .. for the full article go here
Enough for today.
Please leave a comment
.. if the “leave a comment link” is not showing up on this page, it just means that Wordpress and I are still at odds, and you have to open up this topic on its own page to see that link ( sigh ).
And for those of those mainstreamers (the “bricks” people of the “bricks and clicks”) who are worried that I have gone all webby.. don’t worry : be happy - I still have plenty of interest in the real (non-e) world, and plenty to say.
I’ll get out of your way now.
John Aitchison said,
March 21, 2007 @ 12:33 pm
I just came across Peter Norvig’s post ” the Alexa Toolbar and the Problem of Experiment Design” at http://norvig.com/logs-alexa.html
His point is bias. The extent to which (Alexa) data truly represents a random sample of internet users .. “in fact it only represents those who have installed the Alexa toolbar, and that sample is not random. The samplees must be sophisticated enough to know how to install the toolbar, and they must have some reason to want it. It turns out that the toolbar tells you things about web sites, so it is useful to people in the SEO (Search Engine Optimization) industry, so it overrepresents those people.”
He goes on to talk about the size of the bias, and it appears that it can be huge.
So before we start build web analytics edifices, it might be a good idea to have yet another look at the quality of the data on which our analyses rest.
» getting a handle on Web Analytics [ Data Sciences Analytics ] said,
April 8, 2007 @ 1:09 pm
[…] It is not so clear what you can do with the data other than count it, and I have expressed some skepticism in my post “Website Optimization and Data Analytics”, but if you are interested the Immeria site gives an insight into what the commercial software Omniture does in this post Instances vs. Visits in Omniture […]