Capitalization and case sensitivity in URLs matters for SEO
One thing I often see, primarily on sites that use Windows IIS servers, is inconsistent capitalization in URLs. For example:
www.mysite.com/products (lower case)
www.mysite.com/Products (proper case)
This is most common on Windows hosts, since Windows file handling is case insenitive, so that the above two links would serve the same content regardless of how the actual file or URL was stored or handled in the system.
On UNIX hosts, however, the above two links would not be the same — one link would typically return a 404 Not Found error if the capitalization did not match a file name.
Since on Windows hosts the user gets the same web page back regardless of capitalization, does it matter if a website uses a mix of lower and upper case on a URL when referring to the same page, and does this impact search engine (SEO) rankings?
The answer is yes, it does matter and it can hurt your SEO rankings.
The problem is that to a search engine, www.mysite.com/products and www.mysite.com/Products are different pages. Per web standards, URLs are case sensitive and search engines go by these standards. Thus, each version of the URL will be crawled separately, and the search engines must then determine if they are different pages. This means that duplicate content filters area applied, and one version could be ignored (and any inbound links that use the wrong case could be ignored).
Thus inconsistent use of capitalization can hurt your SEO rankings by diluting your page rank flow to the “real” page, excess crawling, and having pages filtered due to duplicate content.
Google advised using consistent case recently on the October 22, 2008 Google Webmaster Chat forum:
Question: “
Does inconsistent capitalization of URLs cause duplicate content issues and dilution of page rank? For example www.site.com/abc vs www.site.com/Abc. On Windows hosts, these are the same page, but are different pages on Unix hosts.”
Answer: “…
based on the existing standards, URLs are case-sensitive, so yes, these would be seen as separate URLs. Since the content on the URLs is the same, we’ll generally recognize that and only keep one of them. However, we’d recommend that you try to keep all links going to one version of the URL. Keep in mind that this also applies to robots.txt files.” (that answer was provided by John Mu of Google’s Switzerland office)
So, it is a good idea to review your internal website linking, as well as external links to your site, to be sure you are using consistent capitalization in all your links. You could use your web analytics data, sorted by URL, to spot duplication. You could also use a website scanning tool like Xenu to get a list of all links in your site.
You can prevent capitalization issues by having strong standards for how you code URLs, such as all lower case, all upper case (not recommended since it looks ugly), or all proper case (first letter of each word capatalized). The convention you use does not matter for SEO — what is important is that you are consistent.
Note that case sensitivity also applies to Robots.txt files. For example, you could specify “Disallow: /scripts/” but if a search engine found a link to something like “www.mysite.com/Scripts/xxxx.asp”, it would crawl it, bypassing the intent of your robots.txt command. Thus, you should also review your robots.txt, and on Windows hosts where you have not used capitalization consistently, put the other versions in your robots.txt file.
John Erickson
LeadQual

