Thursday, August 17, 2006
Explaining the unexplainable
Oddly enough I've been seeing more and more whacky things with Google and with various hosting companies out there.
Here are some of my favorites:
I implemented a canonical url fix on an apache server using mod_rewrite. In the rewriterule flag I declared the redirect to be a 301 ([R=301]). After checking the headers it redirected but the header returned only a 200 ok... Nice...
Here is a good one from Google:
A clients home page listing is showing up in Google as https://www.somedomain.com/
In Google's Help center http://www.google.com/support/webmasters/bin/answer.py?answer=35302 they recommend you deny the bots from the ssl by using the robots.txt file.
Since some hosts map the non ssl and ssl protocols to the same folder on the server you could literally tell Goolge to deny both....Not good....
Here is my favorite:
I built a site a while back and had 2 forms of navigation. The first for the user, it was a drop down form for ease of use, it would send the user to the correct page and the url was dynamic. The second form of navigation was text links and static urls via mod_rewrite to the exact same pages as the form.
This would allow search engines to completely see everything a user would see. No links went to the dynamic versions. All linking was done to the static mod rewrite version.
After doing a site:www.domain.com for the url I discovered Google had a cache on both dynamic and version.
Since when did Google put together the variables in a form and crawl the urls??? Go figure...
Here are some of my favorites:
I implemented a canonical url fix on an apache server using mod_rewrite. In the rewriterule flag I declared the redirect to be a 301 ([R=301]). After checking the headers it redirected but the header returned only a 200 ok... Nice...
Here is a good one from Google:
A clients home page listing is showing up in Google as https://www.somedomain.com/
In Google's Help center http://www.google.com/support/webmasters/bin/answer.py?answer=35302 they recommend you deny the bots from the ssl by using the robots.txt file.
Since some hosts map the non ssl and ssl protocols to the same folder on the server you could literally tell Goolge to deny both....Not good....
Here is my favorite:
I built a site a while back and had 2 forms of navigation. The first for the user, it was a drop down form for ease of use, it would send the user to the correct page and the url was dynamic. The second form of navigation was text links and static urls via mod_rewrite to the exact same pages as the form.
This would allow search engines to completely see everything a user would see. No links went to the dynamic versions. All linking was done to the static mod rewrite version.
After doing a site:www.domain.com for the url I discovered Google had a cache on both dynamic and version.
Since when did Google put together the variables in a form and crawl the urls??? Go figure...




