Traffic coming from search engines is traditionally considered to have the highest value since search engines have emerged on the web. Thousands of webmasters work hard every day to make sure their sites are better optimized for better search engine rankings. It's no secret that the amount of work and money needed for desired results largely depend on the niche of the search queries a site is optimized for, and their search volume. Moreover, sites in different niches require different promotion techniques. It is natural that the competition between market players drives the optimization technology forward.
New mechanisms of converting traffic into money have appeared, and with that the borderline between commercial and non-commercial keywords became less and less visible. Still, we can say that even now there are niches with heavy competition, while in some other niches search engine traffic is less valuable.
Additionally, search engines never stop developing. Ranking algorithms are constantly improving, server capacities are increased, SERP (search engine result page) moderation has been introduced, and so on. As a result, optimization techniques which brought guaranteed results yesterday may be not at all efficient today.
Still, as more and more people start using the web and the amount of information offered by all kinds of sites keeps growing, it is difficult to say when the search engines' processing power will be great enough to deliver relevant results for all the keyword niches and topics. Currently, when you study search engine rankings, you can see legitimate, content-packed sites, as well as doorways, a form of search engine spam. The proportion between these is different from niche to niche. We won't touch upon the ethics and economics of search engine spam. This topic deserves a discussion of its own.
This report focuses on analyzing Google's SERP to spot the variations observed in different niches and with keywords of different search volume.
2.
How Google algorithms evolved
Google, the worldwide famous search engine, quickly became popular due to the quality of its search results, backed up by the revolutionary Page Rank technology. Soon Google started getting more visitors than any other search engine and became a powerful source of targeted traffic for websites. It is natural that many webmasters started specializing in optimizing sites for Google.
Black hat SEO (search engine optimization), or spamdexing, became widely popular. The industry of spamdexing gained even more popularity when a multitude of affiliate programs and pay-per-click systems emerged.
Spamdexing, or illicit optimization and promotion methods, is among the most critical problems for any search engine. Google is known to have taken drastic measures against it. Many webmasters remember the widely discussed technologies like Florida, Hilltop, Trust Rank, and the like. In general, as new algorithms were introduced, less and less spamming sites could be seen in SERPs. Some time passed, and the "black hat" webmasters upgraded their techniques, and doorways started showing up in rankings again.
Giving Google some justice, it has to be noted that with all these newly introduced algorithms, Google became a much better search engine, while spamdexing grew less and less profitable. It is in the highly commercialized and competitive niches that filter-overcoming technologies are implemented with the greatest speed. Consequently, Google pays more attention to these niches.
Having spent some time studying these niches, one can notice that Google SERPs behave differently from niche to niche.
3.
Definitions and concepts
Before we start our analysis, certain notions need to be defined to avoid misinterpretation of the results. In fact, this is not an easy thing to do, because there is no agreement within the online community concerning the criteria differentiating between legitimate sites with content and spamdexing pages.
Let us start with search engine spam, or doorways. There have been a lot of definitions of this, here is one of them:
Doorways are a technology often used in spamdexing. A doorway basically is a site page optimized for one or several keyphrases aimed to get ranked high in the SERPs. An automatically generated doorway contains random text which includes necessary keyphrases. Thus, a doorway is useless to a surfer. A manually made doorway may contain narrow-niche information valuable for internet users.
Yet, these definitions do not provide us with exact criteria to differentiate doorways from other sites. Alas, there are no simple, unambiguous definitions today. What is more, as technologies develop, artificially made sites look more and more like regular, legitimate, web pages filled with valuable content. Sometimes only an online marketing professional can tell a quality doorway from a content-based site.
Defining quality content-based sites is even more difficult. Sometimes, a plain HTML page with pure text has more SE value than a site produced by major developers and promoters.
With all this taken into account, we may conclude that if somebody aims to detect doorways in SERPs, he or she will use subjective guidelines rather than objective characteristics.
However, such evaluations are no good for SE engines. Search engines separate spamdexing pages from content-based sites using a wide range of parameters. The exact combination of these parameters and their relative importance are kept in secret. Moreover, spam detecting techniques never stop evolving, and new evaluation methods are introduced all the time.
Taking all the above mentioned facts into account, we find it appropriate to distance ourselves from popular definitions of "white hat" and "black hat" sites. For our analysis it will be much more convenient to use the Google SERPs and introduce new definitions with certain approximations.
We will refer to sites which exist in Google's SERP for a long period of time comparing to the duration of the experiment (from 18 days) as "white hat" sites.
On the other hand, sites which lived in the SERPS for less than a week will be referred to as spamdexing, or doorways.
You should see these definitions probabilistic. It is evident that you are far less likely to find doorways in the first group compared to the second one.
Additionally, you have to understand that these definitions do not describe legitimate sites and spamdexing with 100% accuracy. Sometimes, when ranking algorithms change, quality sites disappear from the SERP. For example, there are news sites which publish articles on the topics and niches we addressed here. After the page contents changes, time passes and these pages disappear from the SERPs for these keyphrases. Also, sometimes you can find elaborately made doorways among "white hat" sites which have been in the SERP for quite a while.
Anyway, we can openly say that these exclusions do not influence the entire picture. Moreover, you have to take into account the fact that you cannot address a site individually when processing SERPs containing hundreds of thousands of links.
4.
Target setting and input data
To study the way Google behaves in different keyword spheres, we took 6 niches.
- Gambling (casinos and gambling)
- Pills (pills, stimulants, generic meds, male products etc)
- Dating (dating, chats, matching)
- Adult (erotic keywords)
- Cars (everything related to cars)
- Gifts (gifts, souvenirs, and the like)
For every niche, we built databases of one-, two-, and three-word queries from wordtracker.com.
Each database contained 30,000 queries.
The total database analyzed contained 180 thousand queries.
First 20 SERP positions for every query are saved and analyzed every day.
The experiment started on July 12.
The experiment ended on August 19.
Experiment objective:
- pointing out main players in every niche;
- detecting doorway pages;
- finding typical optimization techniques;
- gathering and analyzing statistical data.
5.
Software and services used to process the data
Seodigger.com is a tool which shows for which keywords and phrases a site is ranked high by Google.
Concept: The service saves Google's first 20 results for 44 million popular keywords. After that, a database of correspondences is built:
- URL -> keywords for which this URL is in Google SERP
- Site (including all inner pages) -> keywords for which a site's pages are in Google's SERP.
Serparchive.org is a tool which saves the first 100 SERP results for given keywords in a number of search engines on a daily basis. It helps to monitor how site rankings change with time.
SEOquake.com is an extension for the FireFox browser. It quickly shows a site's parameters using the SERPs of leading search engines and any other pages (documents).
To make sure our analysis is as accurate as possible, edge effects have to be considered.
- A "white hat" site appears in the SERP when the watching period ends. In this case its SERP life can be less than 2 weeks. It is impossible to consider such sites within our experiment. Though, the percentage of such sites is not high and it won't influence the general picture.
- As Seodigger.com works with the first 20 positions in the SERP, these stats can be "spoiled" by "white hat" sites ranked in the end of this top 20. If a site's position fluctuates during the experiment, it may be considered to be a doorway, though it's actually a content site. To exclude this edge effect, two databases were built. The first one is the general database containing sites ranked 1-15 in SERPs for the entire experiment duration. The second database is an additional one comprising sites ranked at positions from 15 to 20. If spamdexing is detected in the first database, it is checked in the second database to find out whether it can be a legitimate site. If we see that a site has been in the second database for a while, it won't be considered a doorway. All the data given below was gathered and processed using this edge effect.
7.
Input data obtained during the experiment
The experiment lasted for 26 days. During this period we used
Serparchive.org to save the SERPs for every keyphrase on a daily basis. Then,
Seodigger.com calculated the positions of a page for these queries.
All the materials and analysis attempts made below are nothing but statistical processing of the data obtained.
8.
Site stats for every group
Using previously defined notions of "white hat" and spamdexing sites, we will evaluate the quantity of sites of both types for every group of keyphrases we selected.
To do this, we need to calculate the quantity of page links which were in the SERPs for 1, 2, 3, and up to 36 days. To display this information in a more understandable way, let's split the entire experiment duration into 6 equally long periods. For our research, periods 1 and 6 are the most interesting. The first one will contain spamdexing sites, according to our definition, while the last one will comprise "white hat" sites.