World Library  
Flag as Inappropriate
Email this Article


Article Id: WHEBN0000043651
Reproduction Date:

Title: Spamdexing  
Author: World Heritage Encyclopedia
Language: English
Subject: Spamming, Cloaking, Spam in blogs, Search engine optimization, Parasite hosting
Collection: Black Hat Search Engine Optimization, Spamming, World Wide Web
Publisher: World Heritage Encyclopedia


In [5]

Common spamdexing techniques can be classified into two broad classes: content spam[4] (or term spam) and link spam.[3]


  • History 1
  • Content spam 2
    • Keyword stuffing 2.1
    • Hidden or invisible text 2.2
    • Meta-tag stuffing 2.3
    • Doorway pages 2.4
    • Scraper sites 2.5
    • 2.6 Article spinning
    • Machine translation 2.7
    • Pages with no information related to page title 2.8
  • Link spam 3
    • Link-building software 3.1
    • Link farms 3.2
    • Hidden links 3.3
    • Sybil attack 3.4
    • Spam blogs 3.5
    • Page hijacking 3.6
    • Buying expired domains 3.7
    • Cookie stuffing 3.8
    • Using world-writable pages 3.9
      • Spam in blogs 3.9.1
      • Comment spam 3.9.2
      • Wiki spam 3.9.3
      • Referrer log spamming 3.9.4
  • Other types of spamdexing 4
    • Mirror websites 4.1
    • URL redirection 4.2
    • Cloaking 4.3
  • See also 5
  • References 6
  • External links 7
    • To report spamdexed pages 7.1
    • Search engine help pages for webmasters 7.2
    • Other tools and information for webmasters 7.3


The earliest known reference[2] to the term spamdexing is by Eric Convey in his article "Porn sneaks way back on Web," The Boston Herald, May 22, 1996, where he said:

The problem arises when site operators load their Web pages with hundreds of extraneous terms so search engines will list them among legitimate addresses. The process is called "spamdexing," a combination of spamming — the Internet term for sending users unsolicited information — and "indexing." [2]

Spamdexing is the practice of search engine spamming. It is a form of Search Engine Optimization (SEO) spamming, which is the art of making a website attractive to the major search engines for optimal indexing. Spamdexing is the practice of creating websites that will be illegitimately indexed with a high position in the search engines. Spamdexing is sometimes used to try and manipulate a search engine’s understanding of a category. The goal of a web designer is to create a web page that will find favorable rankings in the search engines, and they create their pages according the standards that they believe will help. Some of them resort to spamdexing, often unbeknownst to their clients.

While spamdexing has interfered with the finding of information on the internet, measures have been taken to curb it with some success. Spamdexing was a big problem in the 1990s, and search engines were fairly useless because they were compromised by spamdexing. Once Google came on the scene, that all changed – Google developed a page ranking system that fought against spamdexing quite well, discounting spam sites and awarding true, relevant websites with high page rankings.

Content spam

These techniques involve altering the logical view that a search engine has over the page's contents. They all aim at variants of the vector space model for information retrieval on text collections.

Keyword stuffing

Keyword stuffing involves the calculated placement of keywords within a page to raise the keyword count, variety, and density of the page. This is useful to make a page appear to be relevant for a web crawler in a way that makes it more likely to be found. Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he advertises his scam.[6] He places hidden text appropriate for a fan page of a popular music group on his page, hoping that the page will be listed as a fan site and receive many visits from music lovers. Older versions of indexing programs simply counted how often a keyword appeared, and used that to determine relevance levels. Most modern search engines have the ability to analyze a page for keyword stuffing and determine whether the frequency is consistent with other sites created specifically to attract search engine traffic. Also, large webpages are truncated, so that massive dictionary lists cannot be indexed on a single webpage.

Hidden or invisible text

Unrelated hidden text is disguised by making it the same color as the background, using a tiny font size, or hiding it within HTML code such as "no frame" sections, alt attributes, zero-sized DIVs, and "no script" sections. People screening websites for a search-engine company might temporarily or permanently block an entire website for having invisible text on some of its pages. However, hidden text is not always spamdexing: it can also be used to enhance accessibility.

Meta-tag stuffing

This involves repeating keywords in the Meta tags, and using meta keywords that are unrelated to the site's content. This tactic has been ineffective since 2005.

Doorway pages

"Gateway" or doorway pages are low-quality web pages created with very little content, but are instead stuffed with very similar keywords and phrases. They are designed to rank highly within the search results, but serve no purpose to visitors looking for information. A doorway page will generally have "click here to enter" on the page. In 2006, Google ousted BMW for using "doorway pages" to the company's German site,[7]

Scraper sites

Scraper sites are created using various programs designed to "scrape" search-engine results pages or other sources of content and create "content" for a website. The specific presentation of content on these sites is unique, but is merely an amalgamation of content taken from other sources, often without permission. Such websites are generally full of advertising (such as pay-per-click ads), or they redirect the user to other sites. It is even feasible for scraper sites to outrank original websites for their own information and organization names.

Article spinning

Article spinning involves rewriting existing articles, as opposed to merely scraping content from other sites, to avoid penalties imposed by search engines for duplicate content. This process is undertaken by hired writers or automated using a thesaurus database or a neural network.

Machine translation

Similarly to Article spinning, some sites use machine translation to render their content in several languages, with no human editing, resulting in unintelligible texts.

Pages with no information related to page title

Publishing web pages that contain information that is unrelated to the title is a misleading practice known as deception. Despite being a target for penalties from the leading search engines that rank pages, deception is a common practice in some types of sites, including dictionary and encyclopedia sites. The search for "We could not find the full phrase you were looking for" in Google shows 13 million results from So even though the page states that it doesn't have any information about the full phrase, it still is the main information in the page title, coming before anything else.

Link spam

Link spam is defined as links between pages that are present for reasons other than merit.[8] Link spam takes advantage of link-based ranking algorithms, which gives websites higher rankings the more other highly ranked websites link to it. These techniques also aim at influencing other link-based ranking techniques such as the HITS algorithm.

Link-building software

A common form of link spam is the use of link-building software to automate the search engine optimization process.

Link farms

Link farms are tightly-knit communities of pages referencing each other, also known facetiously as mutual admiration societies.[9] Use of links farms has been greatly reduced after Google had launched the Panda Update back in February 2011, making significant improvements in its algorithm to detect link farms meant to game ranking.

Hidden links

Putting hyperlinks where visitors will not see them to increase link popularity. Highlighted link text can help rank a webpage higher for matching that phrase.

Sybil attack

A multiple personality disorder patient "Sybil". A spammer may create multiple web sites at different domain names that all link to each other, such as fake blogs (known as spam blogs).

Spam blogs

Spam blogs are blogs created solely for commercial promotion and the passage of link authority to target sites. Often these "splogs" are designed in a misleading manner that will give the effect of a legitimate website but upon close inspection will often be written using spinning software or very poorly written and barely readable content. They are similar in nature to link farms.

Page hijacking

Page hijacking is achieved by creating a rogue copy of a popular website which shows contents similar to the original to a web crawler but redirects web surfers to unrelated or malicious websites.

Buying expired domains

Some link spammers monitor DNS records for domains that will expire soon, then buy them when they expire and replace the pages with links to their pages. See Domaining. However, Google resets the link data on expired domains. To maintain all previous Google ranking data for the domain, it is advisable that a buyer grabs the domain before it is "dropped". Some of these techniques may be applied for creating a Google bomb — that is, to cooperate with other users to boost the ranking of a particular page for a particular query.

Cookie stuffing

Cookie stuffing involves placing an affiliate tracking cookie on a website visitor's computer without their knowledge, which will then generate revenue for the person doing the cookie stuffing. This not only generates fraudulent affiliate sales, but also has the potential to overwrite other affiliates' cookies, essentially stealing their legitimately earned commissions.

Using world-writable pages

Web sites that can be edited by users can be used by spamdexers to insert links to spam sites if the appropriate anti-spam measures are not taken.

Automated spambots can rapidly make the user-editable portion of a site unusable. Programmers have developed a variety of automated spam prevention techniques to block or at least slow down spambots.

Spam in blogs

Spam in blogs is the placing or solicitation of links randomly on other sites, placing a desired keyword into the hyperlinked text of the inbound link. Guest books, forums, blogs, and any site that accepts visitors' comments are particular targets and are often victims of drive-by spamming where automated software creates nonsense posts with links that are usually irrelevant and unwanted. Many of the blogs like, Wordpress or Blogger, make their comments sections nofollow by default due to concerns over spam.

Comment spam

Comment spam is a form of link spam that has arisen in web pages that allow dynamic user editing such as wikis, blogs, and guestbooks. It can be problematic because agents can be written that automatically randomly select a user edited web page, such as a WorldHeritage article, and add spamming links.[10]

Wiki spam

Wiki spam is a form of link spam on wiki pages. The spammer uses the open editability of wiki systems to place links from the wiki site to the spam site. The subject of the spam site is often unrelated to the wiki page where the link is added. In early 2005, WorldHeritage implemented a default "nofollow" value for the "rel" HTML attribute. Links with this attribute are ignored by Google's PageRank algorithm. Forum and Wiki admins can use these to discourage Wiki spam.

Referrer log spamming

Referrer spam takes place when a spam perpetrator or facilitator accesses a web page (the referee), by following a link from another web page (the referrer), so that the referee is given the address of the referrer by the person's Internet browser. Some websites have a referrer log which shows which pages link to that site. By having a robot randomly access many sites enough times, with a message or specific address given as the referrer, that message or Internet address then appears in the referrer log of those sites that have referrer logs. Since some Web search engines base the importance of sites on the number of different sites linking to them, referrer-log spam may increase the search engine rankings of the spammer's sites. Also, site administrators who notice the referrer log entries in their logs may follow the link back to the spammer's referrer page.

Other types of spamdexing

Mirror websites

A mirror site is the hosting of multiple websites with conceptually similar content but using different URLs. Some search engines give a higher rank to results where the keyword searched for appears in the URL.

URL redirection

URL redirection is the taking of the user to another page without his or her intervention, e.g., using META refresh tags, Flash, JavaScript, Java or Server side redirects. However, 301 Redirect, or permanent redirect, is not considered as a malicious behaviour.


Cloaking refers to any of several means to serve a page to the search-engine spider that is different from that seen by human users. It can be an attempt to mislead search engines regarding the content on a particular web site. Cloaking, however, can also be used to ethically increase accessibility of a site to users with disabilities or provide human users with content that search engines aren't able to process or parse. It is also used to deliver content based on a user's location; Google itself uses IP delivery, a form of cloaking, to deliver results. Another form of cloaking is code swapping, i.e., optimizing a page for top ranking and then swapping another page in its place once a top ranking is achieved.

See also


  1. ^ , Danny Sullivan's video explanation of Search Engine Spam, October 2008SearchEngineLand . Retrieved 2008-11-13.
  2. ^ a b c "Word Spy - spamdexing" (definition), March 2003, webpage:WordSpy-spamdexing.
  3. ^ a b  
  4. ^ a b  
  5. ^ Smarty, Ann (2008-12-17). "What Is BlackHat SEO? 5 Definitions".  
  6. ^ "secrets-to-keeping-your-new-email-address-spam-free". Retrieved 2 January 2014. 
  7. ^ Segal, David (2011-02-13). "The Dirty Little Secrets of Search".  
  8. ^  
  9. ^ Search Engines:Technology, Society, and Business - Marti Hearst, Aug 29, 2005
  10. ^  

External links

To report spamdexed pages

  • Found on Google search engine results
  • Found on Yahoo! search engine results
  • Found on Spamdexed website

Search engine help pages for webmasters

  • Google's Webmaster Guidelines page
  • Yahoo!'s Search Engine Indexing page

Other tools and information for webmasters

  • AIRWeb series of workshops on Adversarial Information Retrieval on the Web
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.