Growing Toolset for Managing SEO Indexing, Crawling and Pagerank Flow

With the recent introduction of the canonical link tag, search engines are starting to give us a pretty comprehensive set of tools to manage how a website is crawled and indexed.  These tools have been developing over time, and are a bit ad-hoc and overlap in confusing ways, but we now have some tools that solve some traditionally thorny SEO problems.

I thought it would be good to sit back and take inventory of these tools, and how we can use them.

First of all, here are some of the issues we’re trying to solve:

  1. Keeping search engines from indexing pages we don’t want them to index.
  2. Keeping search engines from crawling pages we don’t want them to crawl.
  3. Keeping search engines from giving page rank to certain pages (whether on our site or on another site).
  4. For pages that have variations in the URL due to parameters, capitalization issues, different pathways, etc, getting search engines to index just one version of that URL, and focus all page rank other URL formats get onto that one URL.
  5. Removing pages from the index we’d like to get out.

To manage these issues, we now have some good tools:

  1. Robots.txt:  This is the granddaddy of the tools, where you can tell some or all web crawlers to not crawl specific pages or sections of your site.  However, robots.txt does not keep search engines from indexing a page or passing page rank to that page, nor does it cause anything to be removed from the index.  It just stops the crawler.
  2. Robots meta tag:  This is a useful tool for managing indexing and following of all links for a given web page.  You can specify “NOFOLLOW” to stop links on the page from being crawled or passing page rank to other pages, and/or you can specify “NOINDEX” to cause that page to not be indexed.  NOINDEX is especially useful for keeping a page out of the index (and might help it get removed, but that is not 100%).
  3. rel=NOFOLLOW tag:  This is a tag you can put on an individual link to tell search engines to not follow a link, and to not pass page rank to that target page.  It is very useful for “sculpting” pagerank flow within your site, or to external sites, and it is a much more targeted that using robots meta NOFOLLOW (which impacts all links on a page).  It does not prevent crawling of a page through other pathways, and does not keep the target page out of the index.
  4. Canonical link tag:  This new tag lets you specify the canonical link for a given page.  This is very useful for telling search engines to ignore other forms of a link to the same page, to not index those other forms of the URL, and to focus all page rank on one version of the page.  It won’t prevent those other forms of the URL from being crawled, and won’t remove them if they are already indexed, but it will tend to keep those other URLs out of the index.
  5. Search engine URL removal tools:  Each search engine lets you request that certain URLs be removed from the index.  These are useful as a last resort for getting pages removed from the index, and generally work (though they are a pain to use).  It will not prevent a page from being crawled again and added back, but assuming you have done other things to prevent indexing, it shouldn’t come back.

As can be noted, each of these tools does different but overlapping functions.  It can be a bit confusing!  Used together, however, you can do some very powerful things with these to index just the pages you want indexed, control crawling on your website, and focus pagerank to the right pages.

To help conceptualize, here is a chart showing what does what:

Robots options for SEO

In the above chart, a check mark indicates that tool has full control, while a * indicates partial control where the tool is helpful but cannot totally manage that issue.

By carefully using these tools, you can do a lot to focus search engines on the right pages, and help them rank higher.

The main tool I’d like in the future is a better way to get pages removed from the index.

John Erickson
www.leadqual.com

Tags: , , , ,

Leave a Reply