Blue Hat Technique #10-Teaching The Crawlers To Run

2006-04-03

#Blue Hat Techniques

One thing that can be learned only by running quite a few websites at once is the differences in how the bots treat sites different. One of the biggest differences is how often they pull your pages, and how often they update your site in the index. One day while browsing through my different stats, I noticed how certain sites get updated in the indexes daily and some get updated monthly. Some sites that only have about 1,000 links get hit by Googlebot 700times/day while some others that have over 20,000 links only get hit about 30 times/day. This inspired me to begin an experiment.

The Experiment Being one of the few that paid attention in Junior High science class I did this test the right way and put on a white lab coat(just kidding, but wouldn’t that be cool. Where do you buy those things?). My constants were simple. Each site was a brand new domain with similair keywords with similair competition and searches/day. Each site had extremely similair content and had the same template. I also pointed exactly 10 links from the same sites to each site. My variables were also simple. Each site was automatically updated with new pages and with new content at random times, the only difference was how many times in one day they would be updated.

Site 1-Updated 1 times/day

Site 2-Updated 3 times/day

Site 3-Updated 5 times/day

Hypothesis The crawlers behave differently depending on how often the site is updated. The indexes will update more or less frequently depending on how often the site is updated. Time Frames I let the sites sit for one month. I closely monitored each site and it’s progress each day.

Spider Hits After First Month ** Site 1 ** Site 2 Site 3 MSN:214 MSN:478 MSN:1170 Google:184 Google:523 Google:957 Inktomi:226 Inktomi: 391 Inktomi: 514

Time Frames Then I monitored the sites for 6 months.

Cache Update Averages After 6 Months Site 1- MSN: 1.52 times/month Google: 1.4 times/month Site 2- MSN: 18.24 times/month Google: 4.1 times/month Site 3- MSN: 21.70 times/month Google: 13.4 times/month *Yahoo excluded because it’s tougher to tell cache times and date stamps vs. cached pages/title changes.

I also tracked the percentage of pages to actual that were indexed across Google, MSN, and Yahoo Site 1-57% Site 2-81% Site 3-83% Conclusion It is understood that spiders will hit your site for three primary reasons. First, validating a link from another site. Second, checking for changes to your site. Third, reindexing your site. Fourth, pulling robots.txt. With the first and fourth factor neutralized we can assume the update and spider stats are because of the second and third reasons.

Practical Use I understand from this experiment that if you keep your updates consistant and at random times it will force the bots to revist your site more often. They will all start visiting your site at a consistant intervals depending on your number of links. Once they start to build a rythmn of how often your content changes, they will adapt and start visiting more. Once they build that rythmn into timing they will update your site in the indexes accordingly.

Therefore a theory can be built. Crawlers are designed to accomidate your site and the practices of the webmaster. Thus, you can train the crawlers to how your site operates and this will conclude in differences in performance in the indexes.

Flaws In The Experiment Upon factoring the final results I wish I had over done it with a fourth site. Had it update 100 or 1,000 times a day. To see if it performed better or worse than Site 3. The second flaw falls into the category of seasonal changes. I did this experiment between June 2005 - January 2006. The engines could have been acting differently during those times. I know for a fact that MSN was, because it was so new.

← My External Links Neat Backlink Analysis Tool →

Comments (32)

These comments were imported from the original blog. New comments are closed.

George Apr 5, 2006 at 9:48 PM

Very interesting information! Question: were these hand-built sites or auto-generated content?

Do robots treat blogs differently from “traditional” more static sites? Or are the sites treated the same, only crawled more frequently because they’re updated regularly?

Great site, BTW.

Eli Apr 6, 2006 at 10:25 AM

Great question George. They were auto generated content but put into static pages. They weren’t blog sites however. I do think robots do treat blogs differently than traditional static sites, but that is only because blogs are updated at more random intervals than larger sites. Blogging and pinging does have it’s effects as well.

George Apr 20, 2006 at 1:52 PM

Thanks for the response.

I’ve been reading about the blog/ping cycle leately (just getting started with technical aspects of SEO — no hat yet) and I’m simply not clear on it. Could you do a post about blog/ping?

There are only two benefits that I see:

IF it works, you can get new pages indexed fast by blogging a link and then pinging.
You can POSSIBLY give your sites worthwhile links by blogging links then pinging.

I’ve read that this is “dead” (definition: anything I’ve heard about — Capri pants, the Decembrists, blogging/pinging) as a technique. What’s your take?

deeb basheer Apr 28, 2006 at 8:25 PM

can you tell me how to build a self updated website ???

thank you for the great info

deeb basheer

Eli Apr 30, 2006 at 5:11 AM

Sure Deep, You will need some experience in coding either cgi or php. Basically you just write all your content and put it into a database. Then write the script to pull one of the sections of content and feed it into the main page. The other way of doing it is to create the pages and then cycle links to them on hte main page on a schedule. Creating a cronjob(scheduled server event) will be needed.

Feb 4, 2007 at 7:20 PM

Got your cool ass lab coat for you. Just hit me with the size. The wife works for Clinique and they go with the “laboratory” look.

They sell to their employees at $200+/coat but for you, my friend, $0.

Worth every penny for all the sweet advise from a evil genius. Only been here about an hour and you have already taught me a trick or 2. Any methods discussed on this site your favorite?

Eli Feb 4, 2007 at 8:27 PM

hehe, I have no idea what labcoat size i am I wear a mens large shirt if that helps Labcoats are badass, I’d totally wear one all the time. I’d be one of those creepy scientists. So if anyone has an evil looking labcoat to hook me up with you can mail it to my office on BlueHatSEO.com whois info.

thanks for the compliments by the way. Feel free to visit anytime.

neil strauss Jan 21, 2008 at 8:55 AM

These days, blogs that release a new post gets that post index in literally less than an hour!

Prosperity Writer Mar 24, 2008 at 2:08 AM

from your experiment is it safe to say that putting a blog, mydomain.com/blog, for example, in my non-blog website improve indexing?

Forumistan Apr 7, 2008 at 3:59 PM

Great stats man, keep it on…

beverly farrar Jun 22, 2008 at 11:02 AM

According to my website reporting of crawler hits below, it has slowed considerably. What do you think is the cause and how can I remedy this? Thanks so much!

Crawler Hits June 2008 104 May 2008 151 April 2008 0 March 2008 149 February 2008 136 January 2008 128 December 2007 185 November 2007 160 October 2007 153 September 2007 212 August 2007 277 July 2007 580 June 2007 685 May 2007 11 April 2007 791 March 2007 1201 February 2007 948 January 2007 911 December 2006 746 November 2006 460 October 2006 472 September 2006 796 August 2006 1118 July 2006 673 June 2006 820

Supermarket Accidents Sep 24, 2008 at 3:20 PM

Cool experiment. I have noticed it myself too but not one for running experiments. Too lazy to start so end up waiting for others and then read about their results

forex faculty Mar 7, 2009 at 9:32 AM

Thanks again Eli. have read 4 articles so far and still craving for more

Jesper Wallin Aug 13, 2009 at 6:02 PM

A really interesting and very useful article.. How does these figured add up today, seeing the experiment was posted more than 3 years ago?

Also, like someone mentioned, how does search engines treat blogs vs “normal” pages? Sure, pinging and such have it effects, but is that positive or negative? As for trackback and pingback protocols, are these links treated as “real” links in the eyes of a search engine?

Keep up the good work Eli!

Made Easy Forex Sep 7, 2009 at 11:48 AM

from your experiment is it safe to say that putting a blog, mydomain.com/blog, for example, in my non-blog website improve indexing?

Sameday payday Oct 10, 2009 at 1:47 AM

Through lots of comments on your site, i have known that the site is extremely good for offering latest information.

Luis Sep 20, 2010 at 1:19 PM

This is a good experiment to try . lately google hasn’t update my blog for a while

India Tour Packeges Oct 10, 2010 at 6:51 AM

hi,

Eli, Very Nice Post Wow!

abercrombie milano May 16, 2011 at 11:42 PM

sI think am just having some problems with subscribing to RSS feed here.

abercrombie deutschland May 17, 2011 at 3:19 AM

9Thanks i like your blog very much , i come back most days to find new posts like this.

Computer Tips and Tech Talk Jul 11, 2011 at 1:48 AM

Yes, I agree too. Anyway, thanks for sharing!

kadın Jul 29, 2011 at 4:52 AM

I do agree with all of the ideas you have presented in your post. They’re really convincing and will definitely work. Still, the posts are too short for newbies. Could you please extend them a bit from next time? Thanks for the post.

rumah dijual Oct 21, 2011 at 4:29 AM

great post, thanks blue hat. Does the content unique or just scrap from other site?

Louboutin Dec 21, 2011 at 8:49 PM

asdfsa

security guard resume Aug 21, 2012 at 11:38 PM

Does anyone have any example of this in action?

chong tham Sep 8, 2012 at 5:46 AM

Yes, I agree too. Anyway, thanks for sharing!

thong cong Sep 8, 2012 at 8:18 AM

can you tell me how to build a self updated website ???

Jasmine @ Callme.lk Sep 28, 2012 at 4:56 AM

I submitted my site with both programs (demos) and got about 10 succesful submissions with Promosoft and about 70 with Robosoft. (Btw the demo from Robosoft is great, same as full version with a 30 day limit). Obviously theres many other factors, but perhaps the SE’s see these links as low quality (or spam)?

شبكات Dec 3, 2012 at 10:17 PM

Nice Post. This post explains me very well.

شات صوتي Dec 11, 2012 at 12:17 PM

Nice Post. This post explains me very well

visiblexposure Jan 7, 2013 at 12:11 PM

If results guidelines in toward the common engine won’t making site is cost several the These is search can our practice results, content.

Vindicating Michael Mar 19, 2013 at 10:11 AM

I’ve made a little linking research, and at this moment I agree with you