Blue Hat Technique #10-Teaching The Crawlers To Run
One thing that can be learned only by running quite a few websites at once is the differences in how the bots treat sites different. One of the biggest differences is how often they pull your pages, and how often they update your site in the index. One day while browsing through my different stats, I noticed how certain sites get updated in the indexes daily and some get updated monthly. Some sites that only have about 1,000 links get hit by Googlebot 700times/day while some others that have over 20,000 links only get hit about 30 times/day. This inspired me to begin an experiment.
The Experiment Being one of the few that paid attention in Junior High science class I did this test the right way and put on a white lab coat(just kidding, but wouldn’t that be cool. Where do you buy those things?). My constants were simple. Each site was a brand new domain with similair keywords with similair competition and searches/day. Each site had extremely similair content and had the same template. I also pointed exactly 10 links from the same sites to each site. My variables were also simple. Each site was automatically updated with new pages and with new content at random times, the only difference was how many times in one day they would be updated.
Site 1-Updated 1 times/day
Site 2-Updated 3 times/day
Site 3-Updated 5 times/day
Hypothesis The crawlers behave differently depending on how often the site is updated. The indexes will update more or less frequently depending on how often the site is updated. Time Frames I let the sites sit for one month. I closely monitored each site and it’s progress each day.
Spider Hits After First Month ** Site 1 ** Site 2 Site 3 MSN:214 MSN:478 MSN:1170 Google:184 Google:523 Google:957 Inktomi:226 Inktomi: 391 Inktomi: 514
Time Frames Then I monitored the sites for 6 months.
Cache Update Averages After 6 Months Site 1- MSN: 1.52 times/month Google: 1.4 times/month Site 2- MSN: 18.24 times/month Google: 4.1 times/month Site 3- MSN: 21.70 times/month Google: 13.4 times/month *Yahoo excluded because it’s tougher to tell cache times and date stamps vs. cached pages/title changes.
I also tracked the percentage of pages to actual that were indexed across Google, MSN, and Yahoo Site 1-57% Site 2-81% Site 3-83% Conclusion It is understood that spiders will hit your site for three primary reasons. First, validating a link from another site. Second, checking for changes to your site. Third, reindexing your site. Fourth, pulling robots.txt. With the first and fourth factor neutralized we can assume the update and spider stats are because of the second and third reasons.
Practical Use I understand from this experiment that if you keep your updates consistant and at random times it will force the bots to revist your site more often. They will all start visiting your site at a consistant intervals depending on your number of links. Once they start to build a rythmn of how often your content changes, they will adapt and start visiting more. Once they build that rythmn into timing they will update your site in the indexes accordingly.
Therefore a theory can be built. Crawlers are designed to accomidate your site and the practices of the webmaster. Thus, you can train the crawlers to how your site operates and this will conclude in differences in performance in the indexes.
Flaws In The Experiment Upon factoring the final results I wish I had over done it with a fourth site. Had it update 100 or 1,000 times a day. To see if it performed better or worse than Site 3. The second flaw falls into the category of seasonal changes. I did this experiment between June 2005 - January 2006. The engines could have been acting differently during those times. I know for a fact that MSN was, because it was so new.
Comments (32)
These comments were imported from the original blog. New comments are closed.
Very interesting information! Question: were these hand-built sites or auto-generated content?
Do robots treat blogs differently from “traditional” more static sites? Or are the sites treated the same, only crawled more frequently because they’re updated regularly?
Great site, BTW.
Thanks for the response.
I’ve been reading about the blog/ping cycle leately (just getting started with technical aspects of SEO — no hat yet) and I’m simply not clear on it. Could you do a post about blog/ping?
There are only two benefits that I see:
IF it works, you can get new pages indexed fast by blogging a link and then pinging.
You can POSSIBLY give your sites worthwhile links by blogging links then pinging.
I’ve read that this is “dead” (definition: anything I’ve heard about — Capri pants, the Decembrists, blogging/pinging) as a technique. What’s your take?
can you tell me how to build a self updated website ???
thank you for the great info
deeb basheer
Got your cool ass lab coat for you. Just hit me with the size. The wife works for Clinique and they go with the “laboratory” look.
They sell to their employees at $200+/coat but for you, my friend, $0.
Worth every penny for all the sweet advise from a evil genius. Only been here about an hour and you have already taught me a trick or 2. Any methods discussed on this site your favorite?
hehe, I have no idea what labcoat size i am I wear a mens large shirt if that helps Labcoats are badass, I’d totally wear one all the time. I’d be one of those creepy scientists. So if anyone has an evil looking labcoat to hook me up with you can mail it to my office on BlueHatSEO.com whois info.
thanks for the compliments by the way. Feel free to visit anytime.
According to my website reporting of crawler hits below, it has slowed considerably. What do you think is the cause and how can I remedy this? Thanks so much!
Crawler Hits June 2008 104 May 2008 151 April 2008 0 March 2008 149 February 2008 136 January 2008 128 December 2007 185 November 2007 160 October 2007 153 September 2007 212 August 2007 277 July 2007 580 June 2007 685 May 2007 11 April 2007 791 March 2007 1201 February 2007 948 January 2007 911 December 2006 746 November 2006 460 October 2006 472 September 2006 796 August 2006 1118 July 2006 673 June 2006 820
A really interesting and very useful article.. How does these figured add up today, seeing the experiment was posted more than 3 years ago?
Also, like someone mentioned, how does search engines treat blogs vs “normal” pages? Sure, pinging and such have it effects, but is that positive or negative? As for trackback and pingback protocols, are these links treated as “real” links in the eyes of a search engine?
Keep up the good work Eli!
hi,
Eli, Very Nice Post Wow!