World Library  
Flag as Inappropriate
Email this Article


Article Id: WHEBN0002831978
Reproduction Date:

Title: Sitemaps  
Author: World Heritage Encyclopedia
Language: English
Subject: Robots exclusion standard, Yahoo! Site Explorer, WikiProject Spam/LinkSearch/, Site map, Search Engine Optimisation
Collection: Google Services, Web Design, Xml-Based Standards
Publisher: World Heritage Encyclopedia


The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. This allows search engines to crawl the site more intelligently. Sitemaps are a URL inclusion protocol and complement robots.txt, a URL exclusion protocol.

Sitemaps are particularly beneficial on websites where:

  • some areas of the website are not available through the browsable interface
  • webmasters use rich Ajax, Silverlight, or Flash content that is not normally processed by search engines.
  • The site is very large and there is a chance for the web crawlers to overlook some of the new or recently updated content
  • When websites have a huge amount of pages that are isolated or not well linked together, or
  • When a website has few external links

Search Engine Indexing

Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results. Specific examples are provided below.

  • Google - Webmaster Support on Sitemaps: "Google doesn't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."[1]
  • Bing - Bing uses the standard protocol and is very similar to the one mentioned below.
  • Yahoo - After the search deal commenced between Yahoo! Inc. and Microsoft, Yahoo! Site Explorer has merged with Bing Webmaster Tools


  • History 1
  • File format 2
    • Element definitions 2.1
  • Other formats 3
    • Text file 3.1
    • Syndication feed 3.2
  • Search engine submission 4
  • Sitemap limits 5
  • Multilingual and multinational Sitemaps 6
  • See also 7
  • References 8
  • External links 9


Google first introduced Sitemaps 0.84 in June 2005 so web developers could publish lists of links from across their sites. Google, MSN and Yahoo announced joint support for the Sitemaps protocol in November 2006. The schema version was changed to "Sitemap 0.90", but no other changes were made.

In April 2007, and IBM announced support for Sitemaps. Also, Google, Yahoo, MS announced auto-discovery for sitemaps through robots.txt. In May 2007, the state governments of Arizona, California, Utah and Virginia announced they would use Sitemaps on their web sites.

The Sitemaps protocol is based on ideas[2] from "Crawler-friendly Web Servers,"[3] with improvements including auto-discovery through robots.txt and the ability to specify the priority and change frequency of pages.

File format

The Sitemap Protocol format consists of XML tags. The file itself must be UTF-8 encoded. Sitemaps can also be just a plain text list of URLs. They can also be compressed in .gz format.

A sample Sitemap that contains just one URL and uses all optional tags is shown below.


The Sitemap XML protocol is also extended to provide a way of listing multiple Sitemaps in a 'Sitemap index' file. The maximum Sitemap size of 10 MB or 50,000 URLs[4] means this is necessary for large sites.

An example of Sitemap index referencing one separate sitemap follows.


Element definitions

The definitions for the elements are shown below:[5]

Element Required? Description
Yes The document-level element for the Sitemap. The rest of the document after the '' element must be contained in this.
Yes Parent element for each entry.
Yes The document-level element for the Sitemap index. The rest of the document after the '' element must be contained in this.
Yes Parent element for each entry in the index.
Yes Provides the full URL of the page or sitemap, including the protocol (e.g. http, https) and a trailing slash, if required by the site's hosting server. This value must be shorter than 2,048 characters. Note that ampersands in the URL need to be escaped as &.
No The date that the file was last modified, in ISO 8601 format. This can display the full date and time or, if desired, may simply be the date in the format YYYY-MM-DD.
No How frequently the page may change:
  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

"Always" is used to denote documents that change each time that they are accessed. "Never" is used to denote archived URLs (i.e. files that will not be changed again).

This is used only as a guide for crawlers, and is not used to determine how frequently pages are indexed.

Does not apply to elements.

No The priority of that URL relative to other URLs on the site. This allows webmasters to suggest to crawlers which pages are considered more important.

The valid range is from 0.0 to 1.0, with 1.0 being the most important. The default value is 0.5.

Rating all pages on a site with a high priority does not affect search listings, as it is only used to suggest to the crawlers how important pages in the site are to one another.

Does not apply to elements.

Support for the elements that are not required can vary from one search engine to another.[5]

Other formats

Text file

The Sitemaps protocol allows the Sitemap to be a simple list of URLs in a text file. The file specifications of XML Sitemaps apply to text Sitemaps as well; the file must be UTF-8 encoded, and cannot be more than 10 MB large or contain more than 50,000 URLs,[4] but can be compressed as a gzip file.[5]

Syndication feed

A [5]

It can be beneficial to have a syndication feed as a delta update (containing only the newest content) to supplement a complete sitemap.

Search engine submission

If Sitemaps are submitted directly to a search engine (pinged), it will return status information and any processing errors. The details involved with submission will vary with the different search engines. The location of the sitemap can also be included in the robots.txt file by adding the following line to robots.txt:


The should be the complete URL to the sitemap, such as: http://www.example.orgmap.xml (however, see the discussion). This directive is independent of the user-agent line, so it doesn't matter where it is placed in the file. If the website has several sitemaps, multiple "Sitemap:" records may be included in robots.txt, or the URL can simply point to the main sitemap index file.

The following table lists the sitemap submission URLs for several major search engines:

Search engine Submission URL Help page Market
Baidu Baidu Webmaster Dashboard China,Hong Kong,Singapore
Bing (and Yahoo!) Bing Webmaster Tools Global
Google Submitting a Sitemap Global
Yandex Sitemaps files Russia,Ukraine,Belarus,Kazahstan,Turkey

Sitemap URLs submitted using the sitemap submission URLs need to be [5]

Sitemap limits

Sitemap files have a limit of 50,000 URLs and 50 megabytes per sitemap. Sitemaps can be compressed using [5]

As with all XML files, any data values (including URLs) must use entity escape codes for the characters ampersand (&), single quote ('), double quote ("), less than (<), and greater than (>).

Multilingual and multinational Sitemaps

In December 2011, Google announced the annotations for sites that want to target users in many languages and, optionally, countries. A few months later Google announced, on their official blog,[6] that they are adding support for specifying the rel="alternate" and hreflang annotations in Sitemaps. Instead of the (until then only option) HTML link elements the Sitemaps option offered many advantages which included a smaller page size and easier deployment for some websites.

One example of the Multilingual Sitemap would be as followed

If for example we have a site that targets English language users through and Greek language users through up until then the only option was to add the hreflang annotation either in the HTTP header or as HTML elements on both URLs like this

But now, one can alternatively use the following equivalent markup in Sitemaps:

See also


  1. ^ "About Google Sitemaps". Up-to-date as of June 2013. 
  2. ^ M.L. Nelson, J.A. Smith, del Campo, H. Van de Sompel, X. Liu (2006). "Efficient, Automated Web Resource Harvesting" (PDF). WIDM'06. 
  3. ^ O. Brandman, J. Cho,  
  4. ^ a b
  5. ^ a b c d e f "Sitemaps XML format". 2008-02-27. Retrieved 2012-05-05. 
  6. ^ "Multilingual and multinational site annotations in Sitemaps". Google Webmaster Central Blog. Pierre Far. May 24, 2012. 

External links

  • Official website
  • "Major Search Engines Unite to Support a Common Mechanism for Website Submission". Google. Nov 16, 2006. 
  • Google news groups
    • Sitemaps (archived)
    • Webmaster help - Sitemap

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.