Wednesday, June 6, 2007

Supplemental and Supplemental Result

Supplemental and Supplemental Result

What is a Supplemental?

A supplemental result is defined by Google as follows:-

A supplemental result is just like a regular web result, except that it’s pulled from our supplemental index. We’re able to place fewer restraints on sites that we crawl for this supplemental index than we do on sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.

So, translated into plain english, supplementals are those pages that Google considers not important enough to include in their main index, but not bad / useless enough to not bother indexing at all.

How do I know if I have supplementals?

Firstly, go to www.google.com and enter the search site:www.yourdomain.com (replace yourdomain.com with your own url). The site: in front of your url is known as a search modifier - there are lots of different search modifiers, but in this case we’re using the site: modifier to tell google to return all pages it has indexed from www.yourdomain.com.

There are a few misconceptions about what constitutes a supplemental result. Some people think that supplementals are what is returned when you click on the “repeat the search with the omitted results included” link at the end of a google search. This is not the case.

That link actually shows ’similar’ content that google thinks might not be relevant to your search, and that content can be supplemental, or non-supplemental in nature.

Actually, a supplemental result is one where the words “Supplemental Result” appear just under the ’snippet’ (the short description of a site) in a google search. The supplemental results usually appear in the later pages of a site: search, following the main indexed pages.

blogsupplemental.gif

Why Do I have Supplemental Results?

Supplemental usually occur for one of the following reasons (in order of increasing likelihood):-

Duplicate content from other sites - have you quoted content from other people’s websites? Does this content make up a large proportion of your page?

Google has sophisticated duplicate content filters that will detect this - remember, it’s ok to quote other sites, but make sure you also have enough good original content on your site to ensure Google doesn’t think you are just plagiarizing.

A general rule is no more than 50% of any given page should be quotes. If you are concerned about whether you may have too much duplicate content, head over to a site called copyscape (www.copyscape.com) and run your page through their tool.

Duplicate content from your own site - it is a sad fact that many content management systems (CMS) are great at helping beginners spend their time writing great original content rather than trying to learn web-design and html, but really lag behind when it comes to being search engine friendly.

Wordpress is one example of a CMS, and it will generally put duplicates of your content all over the joint - for instance, you’ll find this article on the front page of my blog, under the SEO discussions category, and in the archive for March on this site, and they’ll all have different URL’s.Find out about avoiding duplicate content in CMS like wordpress here.

Another cause of duplicate content can be Canonicalization issues - that is where the www and non-www versions of your site are indexed as seperate websites, when in fact they are the same.

Not enough pagerank - is your site more than a few months old? Do you have many other sites linking to you?

If the answer to any of these questions is no, it’s likely that you are in the ’sandbox’, a kind of purgatory between being indexed and being deindexed.

Some people claim the ’sandbox’ is an actual step one needs to go through (ie 3 months of not being indexed) while Google gains trust in your site, but that’s just not the case - it’s more about how many people link to you rather than any deliberate ‘temporary ban’ on indexing for new sites.

Don’t believe me? I have one site (www.jaisaben.com) which is almost entirely supplemental - that’s because it is very much a ‘niche’ site, and I haven’t bothered working on it too much - it’s been in the supplementals for months and months - eventually, one day, when it gets enough people linking to it, it will suddenly pop into the main index.

This site (www.yourdomain.com) is almost entirely indexed, and was within weeks of me starting it. Why? because it has content that other sites like linking to - as a result, Google considers it an important site, and makes pages I write available in their main index within days.

Is Having Supplementals a Bad Thing?

It can be. Are you presenting ‘niche’ content? If that’s the case, your pages will still be returned as answers to a google search whether they are supplemental or not.

If you are presenting mainstream content, supplementals can be a very bad thing. They make it very unlikely that your pages will be returned by a google search (other than using the site: modifier) at all.

Some people say that once your pages are in the supplemental index, they’ll be there for at least three months (until ’supplemental bot’ comes for a visit) or perhaps forever. This may have been true in the past, but not anymore. Whether the supplemental index is the end of the road for your site is completely up to you.

My advice? Everyone should aim to have at least 80% of their ‘content’ pages in the main index. It is not that difficult to do.

Supplementals 101 - Bot Behaviour

First, a bit of ‘bot behavioral psychology’ :). I’ve been observing bot behavior on this site, and others, for many years. During that time I’ve noticed they tend to behave in a set pattern:-

Bot behaviour and the ‘Infant Site’

  • When a site is first submitted, the bots will come and have a fairly deep look at the site, and usually within a few weeks you’ll find your index page listed.
  • From that point on, bots will continue to visit regularly, to check for interesting new content, but they seem unusually reticent to add new content to the google index.
  • At this early stage, it’s very difficult to get anything other than your main page indexed.
  • Googlebot will keep on visiting your site pretty regularly, and at some stage or another you’ll notice some of your other pages appearing in the index, but they will be mainly supplemental.
  • This frustrating cycle will continue forever unless you get the bot really interested by achieving a ‘threshold’ of new inlinks.
  • Once a site has a ‘threshold number of inlinks’ the bot will start to treat your site as ‘an adolescent site’.

Bot behaviour and the ‘Adolescent Site’

  • A site reaches adolescence when it has achieved a threshold number of other sites linking to it - this number doesn’t necessarily have to be large - even 1 link from an ‘authority site’ (page rank 3 or higher) seems to be enough to get a site to this stage.
  • During this stage, ‘deep crawls’ of the site become more frequent.
  • New pages appear in the Google index rapidly after they have been crawled, and usually get a ‘honeymoon’ period - Google figures it will give your new pages the benefit of the doubt, and lists your new page in the main index until it has done a thorough crawl of other sites, and seen whether other pages link to it or not.
  • If Google can’t find other sites linking to your new page, it will often drop back to supplemental status within a few days to a week.
  • During adolescence, the majority of your pages will be in supplementals, and you’ll find that those pages that are indexed are pages that have been directly linked to by other sites.

Bot behaviour and the ‘Mature Site’

  • At some stage Googlebot starts to get really interested in your site, crawls become much more regular, and virtually all new original content is indexed.I’ve heard people say that this is due to the ‘trust factor’ - which I suspect is probably a combination of number and quality of other sites linking to yours, and number of clicks to your site from google searches, indicating relevance.That is the stage this site (yourdomain) has now reached, and I generally find any new article I write is included in the main index within a day, and stays there, irregardless of whether other sites link directly to it or not.
  • I call this stage ‘the mature site’, and this is where you should aim to be. Don’t listen to people who say it’s hard - this site is only 2 months old.

Escape the Supplemental Index

So you have found yourself in the Google supplemental index and you want to escape.

Fair enough - unless you are a webmaster / blogger it’s hard to understand just how frustrating it is to find your hard-work ‘binned’ to the supplemental index - but worry no more, it’s easier to get out of the supplemental index than you may think.

I’ll be giving you three key steps you can take to get your web page out of the supplemental index and stay out.

STEP 1 - Duplicate Content causes Supplementals

Pick a few key pages on your site, and run them through ‘copyscape’ (www.copyscape.com). If copyscape says you have duplicate content on your pages, this could be the reason for the supplemental status of your pages.

Edit the pages, make them more unique, put any quotes in a
tag, and try again. Move to Step 2.

STEP 2 - Backlinks, Backlinks and Backlinks

So you have a page in the supplementals, it is brimming with unique content, and you just can’t wait to get it out - it’s not hard. I have used this technique many, many times, and if done correctly you’ll find it helps bring your whole site fromthe ‘infant’ status I spoke about in my previous article to ‘adolescence’.

  • Find a page on your site that is in the supplementals, that has heaps of unique content, and note down the url of that page.
  • Find a site that has PR3 or better, and allows you to post your url.
    • If you don’t know what Pagerank is, I define it in my article about nofollow
    • Don’t know how to discover pagerank? You can do so by getting Firefox with Google Toolbar (download it from my toolbar to the right)
  • Post your URL on that page, using descriptive anchor text. (eg, if your page is about widgets, the link should say ‘widgets’ if possible).Try to make your link a deep link - like www.yourdomain.com/301-redirects instead of just www.yourdomain.com
  • Can’t find somewhere you can post a link? Some tips:-
    • Your host’s forum / bulletin board (make sure that they aren’t no-following links).
    • A friend with an established website (a link from the first page is always best)
    • Another of your own websites (I’ve done this before and it works)
    • Paid editorial.
    • DO NOT subscribe to link exchange schemes, ‘free’ directory listings or other such ‘offers’. At best, they don’t work, at worst, they can get you penalized.

This strategy has worked without fail for me.

Use it, and expect your target page to be out of the supplementals within a week or less.

Some people call it giving a page ‘link juice’, or ‘link love’ - whatever you call it, it works.

STEP 3 - Submit a Sitemap to Google

Why submit a sitemap? Well, you’ve gone to the effort of getting Google ‘interested’ in your site, so you want to give it the best chance possible of indexing your site properly.

A sitemap will help it do this.

Tomorrow, In part three of this series, I’ll be talking about strategies that will help to KEEP your site indexed.

This advice should help you to progress to a ‘mature site’ that is crawled and indexed regularly, without the need for further intervention to keep new pages from going supplemental.

No comments: