World Library  
Flag as Inappropriate
Email this Article

Link rot

Article Id: WHEBN0000365511
Reproduction Date:

Title: Link rot  
Author: World Heritage Encyclopedia
Language: English
Subject: World Wide Web, Uniform Resource Locator, CSE HTML Validator, Dangling pointer, Association of Publishing Agencies
Collection: Data Quality, Uniform Resource Locator
Publisher: World Heritage Encyclopedia

Link rot

Link rot (or linkrot), also known as link death, link breaking or reference rot, refers to the process by which hyperlinks on individual websites or the Internet in general point to web pages, servers or other resources that have become permanently unavailable. The phrase also describes the effects of failing to update out-of-date web pages that clutter search engine results. A link that does not work any more is called a broken link, dead link, or dangling link. Formally, this is a form of dangling reference: The target of the reference no longer exists.


  • Causes 1
  • Prevalence 2
  • Discovering 3
  • Combating 4
    • Authoring 4.1
    • Server side 4.2
    • User side 4.3
    • Web archiving 4.4
  • See also 5
  • Further reading 6
    • Link rot on the Web 6.1
    • In academic literature 6.2
    • In digital libraries 6.3
  • References 7
  • External links 8


One of the most common reasons for a broken link is that the web page to which it points no longer exists. This frequently results in a 404 error, which indicates that the web server responded but the specific page could not be found. Another type of dead link occurs when the server that hosts the target page stops working or relocates to a new domain name. The browser may return a DNS error or display a site unrelated to the content originally sought. The latter can occur when a domain name lapses and is reregistered by another party. Other reasons for broken links include:

  • Websites can be restructured, redesigned and/or the underlying technology can be changed, altering or invalidating large numbers of inbound or internal links.
  • Many news sites keep articles freely accessible for only a short time period, and then move them behind a paywall. This causes a significant loss of supporting links in sites discussing news events and using media sites as references.
  • Links may expire.
  • Search results from social media such as Facebook and Tumblr are prone to link rot because of frequent changes in user privacy, the deletion of accounts, search result pointing to a dynamic page that has new results that differ from the cached result, or the deletion of links or photos.
  • Links can contain ephemeral, user-specific information such as session or login data. Because these are not universally valid, the result can be a broken link.
  • A link might be broken because of some form of blocking such as content filters or firewalls.
  • Dead links can also occur on the authoring side, when website content is assembled from Internet sources and deployed without properly verifying the link targets.


The 404 "Not Found" response is familiar to even the occasional web user. A number of studies have examined the prevalence of link rot on the web, in academic literature, and in digital libraries.[1] In a 2003 experiment, Fetterly et al. discovered that about one link out of every 200 disappeared each week from the Internet. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year. In 2014, bookmarking site Pinboard reported a “pretty steady rate” of 5% link rot per year.[2]

A 2014 Harvard Law School study by Jonathan Zittrain, Kendra Albert and Lawrence Lessig, determined that approximately 50% of the URLs in U.S. Supreme Court opinions no longer link to the original information.[3] They also found that in a selection of legal journals published between 1999 and 2011, more than 70% of the links no longer functioned as intended. A 2013 study in BMC Bioinformatics analyzed nearly 15,000 links in abstracts from Thomson Reuters’ Web of Science citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.[4]


Discovering broken links might be done manually or automatically. Automated methods, including plug-ins for WordPress, Drupal and other content management system can be used to detect the presence of broken URLs. An alternative is using a specific broken link checker like Xenu's Link Sleuth. However, if a URL returns an HTTP 200 (OK) response, it may be accessible, but the contents of the page could have changed and may no longer be relevant. So manual checking links seems to be a must. Some web servers also return a soft 404, indicating the URL is no longer accessible. Bar-Yossef et al. (2004) [5] developed a heuristic for automatically discovering soft 404s.


There are numerous solutions for tackling broken links: Some work to prevent them in the first place, while others trying to resolve them when they have occurred. There are also numerous tools that have been developed to help combat link rot.


  • Carefully select and implement hyperlinks, and verify them regularly after publication. Best practices include linking to primary rather than secondary sources and prioritizing stable sites. McCown et al., 2005, suggest avoiding URL citations that point to resources on researchers' personal pages.
  • Always look for the most compact and direct URL available, and ensure that it’s clean, with no unnecessary information after the core of the URL.[6] This process is often referred to as URL normalization or URL canonicalization.
  • When available, use digital object identifier (DOIs) and Persistent Uniform Resource Locators (PURLs) whenever possible.
  • Avoid linking to PDF documents if possible. Because PDFs are documents rather than web pages, their content can change without notice, and their names are more likely to contain characters such as spaces that must be translated into safe codes for URLs. Large PDFs may also download slowly and cause a timeout error.[6]
  • Avoiding linking to pages deep in a website, a practice known as deep linking.
  • Using web archiving services (for example, WebCite) to permanently archive and retrieve cited Internet references (Eysenbach and Trudel, 2005).

Server side

  • When URLs change, use redirection mechanisms such as "301: Moved Permanently" to automatically refer browsers and crawlers to the new location.
  • Content management systems may offer built-in solutions to the management of links, such as updating them when content is changed or moved on a site.
  • WordPress guards against link rot by replacing non-canonical URLs with their canonical versions.[7]
  • IBM's Peridot attempts to automatically fix broken links.
  • Permalinking stops broken links by guaranteeing that the content will not move for the foreseeable future. Another form of permalinking is linking to a permalink that then redirects to the actual content, ensuring that even though the real content may be moved etc., links pointing to the resources stay intact.
  • Design URLs—for example, Semantic URLs—such that they won't need to change when a different person takes over maintenance of a document or when different software is used on the server.[8]

User side

  • The Linkgraph widget gets the URL of the correct page based upon the old broken URL by using historical location information.
  • The Google 404 Widget attempts to "guess" the correct URL, and also provides the user with a search box to find the correct page.
  • When a user receives a 404 response, the Google Toolbar attempts to assist the user in finding the missing page.[9]
  •[10] gathers and ranks alternate URLs for a broken link using Google Cache, the Internet Archive, and user submissions.[11] Typing left of a broken link in the browser's address bar and pressing enter loads a ranked list of alternate urls, or (depending on user preference) immediately forwards to the best one.[12]

Web archiving

To combat link rot, web archivists are actively engaged in collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. The goal of the Internet Archive is to maintain an archive of the entire Web, taking periodic snapshots of pages that can then be accessed for free via the Wayback Machine. In January 2013 the company announced that it had reached the milestone of 240 billion archived URLs.[13] National libraries, national archives and other organizations are also involved in archiving culturally important Web content.

Individuals may use a number of tools that allow them to archive web resources that may go missing in the future:

  • The WayBack Machine, at the Internet Archive,[14] is a free website that archives old web pages. It does not archive websites whose owners have stated they do not want their website archived.
  • WebCite, a tool specifically for scholarly authors, journal editors and publishers to permanently archive "on-demand" and retrieve cited Internet references (Eysenbach and Trudel, 2005).
  • Perma, which is supported by the Harvard Law School together with a broad coalition of university libraries, takes a snapshot of a URL's content and returns a permanent link.[3]
  • The Hiberlink project, a collaboration between the University of Edinburgh, the Los Alamos National Laboratory and others, is working to measure “reference rot” in online academic articles, and also to what extent Web content has been archived.[15] A related project, Memento, has established a technical standard for accessing online content as it existed in the past.[16]
  • Some social bookmarking websites allow users to make online clones of any web page on the internet, creating a copy at an independent url which remains online even if the original page goes down.

However, such preserving systems may encounter on and off service interruption so that the preserved URLs are not available now and then.[17]

See also

Further reading

Link rot on the Web

In academic literature

  • Habibzadeh, P.; Sciences, Schattauer GmbH - Publishers for Medicine and Natural (2013-01-01). "Decay of References to Web sites in Articles Published in General Medical Journals: Mainstream vs Small Journals". Applied Clinical Informatics 4 (4)[1]

In digital libraries


  1. ^ a b
  2. ^
  3. ^ a b
  4. ^
  5. ^
  6. ^ a b
  7. ^
  8. ^
  9. ^
  10. ^
  11. ^
  12. ^
  13. ^
  14. ^
  15. ^
  16. ^
  17. ^

External links

  • Future-Proofing Your URIs
  • Jakob Nielsen, "Fighting Linkrot", Jakob Nielsen's Alertbox, June 14, 1998.
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.