Search Engine Optimisation Analysis
Don’t be Fooled by the Keyword Myth
There are a lot of myths around the use of keywords for search engine optimisation (SEO). Many of these are hang-overs from the early days of the internet when search engines were much cruder than today but they have also been peddled by those who claim to have expertise in SEO but actually do not. Two of the biggest myths are, first, that search rankings depend on packing content with keywords and, second, that there is an optimum ‘keyword density’. In fact neither of these is remotely correct in today’s sophisticated world of search. Of course, content has to contain certain words for search engines to pick-up but what matters is the relevance of a document to a search, not how many keywords it contains or their density.
Today’s search engines are nothing like their earlier counterparts. They use very complex and sophisticated algorithms, which are effectively mathematical and statistical formulae in which keywords and keyword density play no role. Indeed, it is possible that putting too many keywords into your documents could have completely the opposite effect of SEO and even result in them ranking as spam!
Given the importance of search and the money spent on SEO it is clearly vital for organisations to understand how their marketing activities, from press releases to web copy, are impacting SEO, but if keywords or keyword density cannot do this, what can? Although search engine companies keep their algorithms a closely guarded secret, there are fundamental weighting terms on which they all rely in one form or another. The most important of these is ‘TF-IDF’, which stands for Term Frequency – Inverse Document Frequency. Put as simply as possible this is a statistical measure that determines the importance of a word in a document against the number of times it appears in a body of documents (in reality it is much more complicated but this captures the essence of what it does). If you think about this for a moment it is actually more logical than might at first appear, otherwise common words such as ‘and’, ‘of’, ‘in’ and ‘the’ would dominate search.
Although TF-IDF is by no means the only factor affecting search rank, the following chart demonstrates just how important TF-IDF is for SEO. It shows how TF-IDF increases dramatically with Google search rankings for three well-known brands BMW, Tesco and Microsoft. Crucially, TF-IDF follows a power law, which put simply means it becomes increasingly important as search rankings get higher.

Measuring and Using TF-IDF
Unlike keywords and keyword density, measuring and using TF-IDF is not something you can do by simply looking at a document, counting a few words and carrying out a few calculations. This is pretty obvious, really, given that today’s search engines are virtually impossible to fool. Despite this it is relatively easy – and very fast – for computer software designed specifically for the task to analyse documents to determine TF-IDF using what is known as ‘text mining’. The following shows a simple example of how this can be done.
During May, 2009, Sainsbury’s launched a marketing campaign celebrating the grocery retailer’s 140th anniversary. The following chart shows a three day period tracking TF-IDF for the Sainsbury’s brand against social media coverage associating the company with its 140th anniversary along with ‘food’, the company’s major product line for which it is best known. It is immediately clear from the chart that Sainsbury’s TF-IDF rises virtually in line with the 140th birthday coverage while ‘food’ drops over the same period (the actual analysis would involve much more data over a longer period but has been simplified here for the sake of clarity).

What does this mean? In simple terms it means that coverage for Sainsbury’s 140th birthday would improve the company’s search ranking while coverage for food over the same period would not. In other words, one of the best ways for Sainsbury’s to improve its SEO would be to generate coverage on its 140th birthday as much as possible. In fact, by carrying out statistical analysis over many different factors affecting Sainsbury’s TF-IDF over a longer period of time we have been able to show that while social media coverage of the company’s 140th birthday was much lower than its coverage on food, it was 10 to 15 times more effective in raising its relevance to SEO. This is a dramatic improvement that would not be obvious from keyword analysis, which would intuitively rank ‘food’ higher than any other search term.
Something else that will be evident from this example is that relevance to search changes, which in fact it does all of the time. As well as enabling organisations to determine the best coverage to improve TF-IDF and therefore their search engine ranking, Spectrum can also track this change to enable relevance to be optimised on a continuous basis. Last but not least we can also track coverage that reduces TF-IDF, allowing organisations to amend their activities accordingly.
For more information go to Information Request