Sitemaps are a way for SEO practitioners and website managers to tell Google and other search engines the structure of their sites and the important URLs on their website. This guide covers the two types of sitemaps, XML and HTML, and how each one works. It will also cover important considerations to avoid with sitemaps.
If you don’t want to create your own sitemap then contact us about our professional SEO services.
There are two different types of sitemaps. They are:
XML sitemaps are built using eXtensible Markup Language (XML). For lessons about XML and to learn how to use this well-formatted and structured language, there are wonderful tutorials on this site (as well as tutorials on html, css and much more).
XML sitemaps should be thought of as a blueprint to your website for search engines. They are a way for web managers to organize the URLs on the website and prioritize them to the search engines. Often, the search crawlers use sitemaps as a way to discover fresh content.
The particular code and rules that compose a sitemap can be found on this site.
Many tools exist to help create XML sitemaps for your site. Two of the most popular web-based tools are:
XML sitemaps can also be created manually, but this is tedious for large sites and even small sites that publish semi-frequently. Therefore, it is recommended to use a content management system or platform that automatically generates your sitemap and pings the search engines. If your system does not automatically support this, a CRON job can be created to update the sitemap at a specified time. The popular SEO plugin Yoast has a tool to create and automatically update your sitemap, if your website is on the WordPress platform.
After the XML sitemap has been created, it should be submitted to Google Search Console and Bing Webmaster Tools. The process for this is simple. Here is how to do it with Google:
You are then able to submit an XML sitemap using the button to the top right of the below screenshot. Once the sitemap has been submitted for a short amount of time, the graphs will show, telling you how many of your submitted URLs are indexed:
The sitemap location should also be added to the robots.txt file using the following line:
It is important to note that sitemaps are not required to be named “sitemap.xml”, as sometimes a site will have multiple sitemaps, as mentioned below.
Large sites must take into account that the search engines, and Google in particular, have a maximum size for sitemaps. According to this Google support article, the maximum size is:
Because of this, large sites will often contain a sitemap index file that contains links to all of the sitemaps. Others will have separate sitemaps for separate sections of the site. For example, a large e-commerce site might have the following sitemaps:
HTML sitemaps are publicly facing, usually linked from the footer of the website, and are used as an alternative way for users to find the content on your website. HTML sitemaps are also another way to ensure that the search crawlers crawl and index as many of your URLs as possible.
According to Duane Forrester of Bing from September 2011, Bing can lose trust in sitemaps that have over 1% of “dirt.” Duane said:
“Your Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. Examples of dirt are if we click on a URL and we see a redirect, a 404 or a 500 code. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap”.
A “dirty” URL in a sitemap can be any of the following:
The URLs listed in the sitemap should be the final URL only.
In order to check for “dirt” in your sitemap, the tool Map Broker exists. You can upload your XML file and it will return a score of your sitemap. Or you can use the popular SEO crawler Screaming Frog, either will provide you with a wealth of information about the health of your sitemap.
Sitemaps remain extremely important to SEO and helping Google understand your site and find new content. Utilized correctly they can be a powerful tool to aid with ranking, used wrongly they can damage your ranking and damage how Google crawls your website.