Posts Tagged ‘missing’

Pages removed from Google due to JavaScript redirects

Wednesday, August 27th, 2008

Recently I had a customer who had many of their pages missing from the Google index, including the home page and all other top level pages.  The home page was indexed, but there was no cache for it, and the title and snippet were from some other source, not the current home page.  It appeared that somehow their pages were being ignored by Google, or they had incurred some kind of penalty.  The pages were indexed fine by Yahoo and MSN, so this was something specific to Google.

We looked carefully at many things to find the problem, including robots.txt, use of redirects on the root URL, XML site map files, linking, use of session IDs, etc etc.  Finally, we isolated the problem: a piece of JavaScript code that conditionally redirected the user to a test page.  Normally the JavaScript variable would not trigger this redirection, but it appears Google did not trust the code, and interpreted this as a cloaking/spam technique where  users are redirected to a different page and do not see what the search engines see.

The code was something like this:

<script type=”text/javascript”>
var test=”yes”;
if (test!= “yes”) {
document.write(”<meta http-equiv=’refresh’ content=’0;URL=http://www.mysite.com/testing/’ />”); }
</script>

This code was was on all pages that were not indexed, and not present on those pages that were indexed.  When we removed this code, the pages started to get back into the index within hours, and already over 1,500 pages have been added to the index.   Thus it was obvious this was the culprit, even though this was just an innocent way of making it easy to show a test page with a simple variable change.

Normally you would not expect a search engine, even Google, to  follow this code.  It is a meta refresh dynamically written by a JavaScript document.write() command.  However, Google spam filters look for “tricky” spam techniques, and they obviously evaluated this code, and did not trust the conditional check on the “test” variable.  I think what happened was that they always followed the meta refresh and indexed that page instead of each original page.  Ouch!

Google has been very clear that their algorithms look for spam techniques that try to fool them.  For example, see http://www.mattcutts.com/blog/seo-mistakes-sneaky-javascript/

I suspect there are many other ways to get your pages banned or penalized by search engines.  It appears that Google is using some very sophisticated techniques, though not quite sophisticated enough in our case!

So, if your pages are missing or removed from search engine indexes, look carefully at any areas where you have JavaScript-controlled redirects, meta refresh commands or other techniques that might be interpreted as spam attempts.

John Erickson
LeadQual