Black Hole SEO: The Real Desert Scraping
Alright fine. I’m going to call uncle on this one. With my last Black Hole SEO post I talked about Desert Scraping. Now understand, I usually change up my techniques and remove a spin or two before I make them public as to not hurt my own use of it. However on this one, in the process, I totally dumbed it down. Upon retrospect it definitely doesn’t qualify as a Black Hole SEO technique, more like a general article, and yet no one called me on it! Com’n guys you’re starting to slip. Enough of this common sense shit, lets do some real black hat. So the deal is I’m going to talk about desert scraping one more time and this time just be perfectly candid and disclose the actual spin I use on the technique.
The Real Way To Desert Scrape 1. Buy a domain name and setup Catch-All subdomains on it using Mod-Rewrite and the Apache config.
2. Write a simple script where you can pull content from a database and spit it out on it’s own subdomain. No general template required.
3. Setup a main page on the domain that points links to the newest subdomains along with their titles to help them get indexed.
4. Signup for a service that monitors expiring domains such as DeletedDomains.com (just a suggested one, there’s plenty much better ones out there).
5. On a cronjob everyday have it scan the newest list of domains that were deleted that day. Store the list in a temporary table in the database.
6. On a second cronjob continuously ran throughout the day have it lookup each expired domain using Archive.org. have it do a deep crawl and replace any links to their local equivalents (ie. www.expireddomain.com/page2.html becomes /page2.html). Do the same with the images used in the template.
7. Create a simple algorithm to replace all known ads you can find and think of with your own, such as Adsense. Also it doesn’t hurt to replace any outgoing links with other sites of yours that are in need of some link popularity.
8. Put the scraped site up on a subdomain using the old domain minus the tld. So if the site was mortgageloans.com your subdomain would be mortgageloans.mydomain.com.
9. Have the cronjob add the new subdomain up on the list of completed ones so it can be listed on the main page and indexed.
What Did This Do? Now you got a site that grows in unique content and niche coverage. Everyday new content goes up and new niches are created on that domain. By the time each subdomain gets fully indexed much of the old pages on the expired domains will start falling from the index. Ideally you’ll create a near perfect replacement with very little duplicate content problems. Over time your site will start to get huge and start drawing BIG ad revenue. So all you have to do is start creating more of these sites. Since there are easily in the six figures of domains expiring everyday that is obviously too much content for any single domain, so building these sites in a network is almost required. So be sure to preplan the load possible balancing during your coding. The fewer scraped sites each domain has to put up a day the better chances of it all getting properly indexed and ranking.
And THAT is how you Desert Scrape the Eli way.
Wink I may just have hinted at an unique Black Hole SEO way of finding high profit and easy to conquer niches. How about exploiting natural traffic demand generated by article branding?
Comments (201)
These comments were imported from the original blog. New comments are closed.
Nice post eli,
that sounds much more like black hole, you are right.
I think no one said something cause you told us a few postings before that you wouldn’t tell us all your secrets and we have to think about it by ourself.
On the other hand perhaps we are all addicted to you and aren’t thinking by ourself cause you do this for us hehe
regards,
RRF
Nice little post but there is one thing that I would like to argue about and that is you say to use the site: operator when you want to see if that page is still indexed, if the page is still indexed then you don’t want to use it.
I prefer to use double quotation marks when checking to see if an article is in the search engines or not, for instance if I do a search for “I wonder if this would actually work” and it brings back results I know that I could end up getting a duplicate content penalty for using such an article.
My point to this is that if you use the site: operator the page may very well not show up and no longer be indexed in the search engines, however the article still very well be indexed and you could still end up with a duplicate penalty.
Just a little something more to add to this post if anyone didn’t already know that about the search engines.
Very nice way to snag some content. But i have to protest! On behalf of the codey squirts like myself, i must they that i bet there are people who would like to buy a script to do this and people like me who like to code and sell their creations. But, you didn’t give the squirts a heads up so that we could have the product all ready.
I will be creating mine very soon, but is not done yet .
Eli,
Awesome indeed. I hope your squirt members will be getting the tools to do this . It will be a great help for us non programmers.
I probably wouldn’t use any adsense ads that I scraped using this technique. I’d be too afraid that I’d grab up pages that were against Google Adsense TOS and were just sitting there like a time-bomb waiting to go off.
I’d have to have something else I could slap in those spots based upon the dimensions of the ad block.
Thanks for the step-by-step.
Doesnt the longevity of this depend upon how the SE’s treat subdomains?
Would a network of unrelated content on a single domain but flagged as spamn when coupled with the pace that something like this could be put together?
Additionally won’t google be paying more attention to subdomains at the moment following the publicity over eBay subdomain spam?
Not suggesting this wont work (im way too much of a noob to have any clue), just curious about how quickly it will be blacklisted.
I don’t think Google can do anything against subdomains.
After all, if they ban a subdomain because another subdomain on the same domain did something bad, all blogspot blogs will be banned in no time, and so are all free hosting sites that use the host’s subdomain.
Well, you’re not really grabbing just any expired domain, if you were smart, you’d grab very targeted subdomains. For instance if you were doing mortgage loans, you’d go through the list of expired domains, grab alll the sites that have mortgage in their title for instance, and add those to your list of subdomains to add.
So, you’re not grabbing 100,000 expiring domains. You might only grab 10-20 or 200-300 a day. Depends on how specific you want to be in your drill down into a niche.
hell– if I am going to spill all the beans. Put a meta robot “no index, no follow” tag on the content. Keep checking the deleted/expired domain in the SERPS– wait for the content to vanish from the SERPs– and on the same day have your robots tag disappear. That should help anyone who has issues with the “duplicate content issue”
again- if you don’t script a little proggie for all of this (or have a programmer do it for you) you are out of your mind. Something tells me our friendly-neighborhood Eli will have a script for sale which (hopefully) does all of this for you.
Good question. It will be answered in a future post, but to give you an idea.. It’s when an article, piece of media, or video becomes popular and quickly produces it’s own search volume and “traffic demand.” For instance, how many times has Leeroy Jenkins been searched before that video came out? <-ooo thats a good example, I’ll use that in the post.
Thats an example of natural traffic demand generated by branding.
Ok, so are you talking about grabbing domains/content from sites that capitalized on “natural traffic demand…” that are now expired? So sites that had good search ‘juice’ for instance for “William Hung” or “LEEroy Jenkins”. Am I on the right track?
Second, I ran across this site gets 5 billion pages index by goole POST. Was that you Eli? :>
Thanks for expanding on that Eli. Another question. This technique doesn’t seem to take into account the “authoritative” part that the 1st post talked about. In the 1st one, we were scraping authority sites like looksmart and wikipedia, using “The Wayback Machine”. On this one, we’re scraping any ‘related’ expired domain content.
The content will still be unique, but don’t we lose the authoritative part?
That is definitely true. Perhaps compensation could be made by looking at the alexa ranking 1 year ago.
Authoritative domains generally don’t expire. So if you got a site that had a high alexa ranking, that means it was more than likely pulling in good search traffic. Thus authoritative content.
elhoim,
why even bother with a dictionary? You are, in theory, creating niche content sites on somewhat specific topics, right? Use an API call to your keyword tool (I know Wordze can do this) to generate a list of keywords within the specific niche you are building for. Run that list past MattC’s 30k expiring domains and you’ve developed your short list. As for content value- running a links query in a SE is a decent place to start, but it should by no means be the only thing you do. Hell– you could check the document for authoritative-y (new word) structure (title tag/header/sub-headers). If the content has JUST expired– take the title tag of the document and see if it ranks for at least the document title.
Eli you always amaze! I have been working non-stop ever since coming across your gold mine of a blog. Your techniques are always orginal and thought provoking, yet sometimes they are just common sense spun with a genius twist.
Thanks for Helping…
If my first born son wasn’t already named Eli… I might have considered it after stumbling upon your blog.
Cheers…
Couple questions/observations…
What’s the point of this checking the PR, backlinks, etc? None of those are going to transfer to you obviously, so are you just thinking of it as a method to filter your list of junk?
Eli, do you recommend bothering to download the images (those that are available) for this? Or is it generally okay to just ignore them?
The biggest roadblock here is obviously acquiring RELEVANT links. Of course you can just do all different kinds of techniques on Eli’s blog, etc…to get them. But I’d like to think that there’s an easy way to get a couple backlinks to each subdomain WHEN you publish the content. By the way, if you have 2 links to each of 300 subdomains, is that considered 600 links to the domain?
Eli what do you mean by “How about exploiting natural traffic demand generated by article branding?”
What is article branding? Are you saying that because we are scraping articles from a previous website than that content must inherently be of interest to searchers?
this is a great idea! i just finished to build my scraper…
a few questions: 1) you think i should keep the old layot of the page, or i should just grab the content and ignore the html tags (with some expections - like p, br, h1, h2, h3, h4, h5, font, ul ,li) 2) should i keep the original links? right now i’m changing all of the links to: 90% - link to some random page on the site, 10% - outbound link 3) To help SE index my site, i added 5 random links to pages on the top and bottom of each page, and i built HTML Sitemap, and XML Sitemap. is that a good idea ? 4) I changed all the filenames from the original filenames to the page title. So if there was a page called article2.html with the title “Black Hole SEO: The Real Desert Scraping” I’m calling the file “Black_Hole_SEO_The_Real_Desert_Scarping.html”. is that good? 5) Should I wait until the old site is not indexed before I publish my site?
Thanks, Nadav
I really like this website and it makes me wish I’m a programmer so I can understand any of this stuff a bit better.
But I have a question, which probably is the next best thing for a guy like me who really wants to take advantage of these tactics using my own black hat sites…
If I would explain these concepts to a programmer… would he understand it like he should?
“…article, piece of media, or video becomes popular and quickly produces it’s own search volume”
Now this sounds even more interesting. Leave Brittany Alone!
Wow sick!
Nice Idea!
A Great Post…… Such a Black hole ………..
I’ll try to get this script…….. soon:)
Eli you always amaze! I have been working non-stop ever since coming across your gold mine of a blog. Your techniques are always orginal and thought provoking, yet sometimes they are just common sense spun with a genius twist.
Thanks for Helping…
“is this post still relevant?”
This is my question, also…
This is your practical explanation I think. Have you ever heard about IQ Enhance?
IQ Enhance increases focus and concentration by promoting brain function in the prefrontal cortex of your brain.
Hillary Rodham Clinton has engaged in women’s rights and human rights. She soon became involved in efforts to protect children, in 1996, she gave a speech about this that attracted considerable attention for General Conference of the Methodist Church.
Whitewater affair was a serious cleat on foot for her since New York Times picked it up during the 1992 election campaign and during the whole of her time as first lady. The case was complex, and concerned the law firm Hillary Clinton was in had acted illegally or in breach of professional ethics in connection with a real estate speculation that went bad. The case was ongoing after passed by federal investigators, who collected documents should be in Clinton’s possession but as she said she did not find. So they turned up anyway after a few years in the White House. It went so far that she was sett managed for a federal grand jury.