In this article I will explore questions regarding HTML and XML sitemaps. Although a sitemap is basically a list of all the web pages of your site, there are a number of issues that can crop up and need to be dealt with.
Submitting an XML sitemap helps make sure the search engines know about all the pages on your site, including URLs that may not be found by a normal crawling process.
[Note: If you have a particularly large website, you will need to use a sitemap index. Search engines will only index the first 50,000 URLs in a sitemap, so if your site has more than 50,000 URLs, you will need to use an index to connect multiple sitemaps together. You can learn how to create indices at sitemaps.org]
Why Submit A Sitemap To Google?
Ever wonder why you should submit your sitemap to Google and the other search engines, or if there is really any value in doing it?
The benefits of submitted a sitemap to Google Search Console are numerous. At the very least, a proper sitemap will have a list of every page that on your site that you would like a search engine to know about. With a sitemap, a website can be efficiently be crawled since the search engine will know exactly how many pages your website has.
Sitemaps also let you give a crawl prioritization rate to web pages. Now although a crawler may not pay attention to the priority you assign a page, this helps that the pages carrying your most important content will be crawled and indexed faster than those with a less priority value.
Why Use Automatically Generated Sitemaps?
There are CMS plugins that are able to generate a sitemap automatically. This means that every time a new piece of information, page, or article is published on your site, the sitemap for that site is updated and submitted to a search engine.
Some popular WordPress Sitemap Plugins:
Google XML Sitemaps
Yoast (includes a sitemap section)
Some pieces of information should be delivered when still fresh, like news articles. You won’t have to wait and guess when the spiders are likely to pay your website a visit. If a sitemap is submitted automatically, you know right away that your new page is in the queue to be visited.
This can be a very important task to make sure that a search engine knows which site is holding the original content. Since content is often syndicated across many web platforms, a Google crawler may find your syndicated content before it finds the content on your site.
Making Sure Google Knows Who Was First
It’s also not uncommon for publishers to have their site’s content “curated” by other websites without a formal syndication agreement. This means that Google may find your article’s content on a popular site (that gets crawled much more often than your own) before it finds the content on your own site.
Telling Google for example, that your page (and content) holds the original and was published before it was delivered to your Facebook or LinkedIn page is important for the rankings your site.
XML sitemaps help content creators establish their claim as the content originator, as the duplicate version will most likely not be found in a search.
Common Questions About Sitemap Submissions
Will the submission of my sitemap mean that my site will be indexed right away?
No. Submitting a sitemap tells a search engine that pages on your site exist, but it doesn’t mean that a crawler will be visiting right away.
Will submitting a sitemap help my site rank higher?
No. Each page in your site will be ranked accordingly and will not be affected by it’s inclusion in a sitemap. A sitemap helps search engines know about the pages on your site, but does not affect it’s rankings once it is known about.
Why is there a difference between my submitted pages and indexed pages?
Will Google show me what pages in my sitemap are indexed?
Google does not show you which pages are indexed. It just shows you how many are submitted compared to how many are indexed through the Search Console.
To get around this, you can do a Google search to try to find which pages are indexed. Use the search string ‘site:yoursite.com” to see what pages are listed, although this might be tedious for sites with many pages, (and even this won’t give you perfect results as Google has various levels of indexing.)
If you are really interested in finding out which pages are indexed, this article from UrlProfiler will be what you are looking for. Their website crawler is also great for other SEO tasks as well.
Dealing With Sitemap Problems
I submitted my index to Google but it says ‘Pending’.
For a sitemap with up to a few hundred pages, it can take about a day before you will see the pending notice disappear. The crawler will visit your site eventually.
The most important thing to look out for is to make sure there are no errors, which you can test before you actually submit your sitemap using the tool they supply.
In larger sites, with thousands of pages, the ‘pending’ notice may be there for days as Google compares what it’s crawler has found in your sitemap to what is already indexed.
My site has thousands of pages, how long will indexing take?
It can take a few days, a week even. But does your site really have thousands of index-able content? What I am getting at is that it is possible that your sitemap is listing pages that have no content, or are of little value. This can happen when using a CMS like WordPress or Joomla.
You want your sitemap to only list pages that you want someone to be able to find. Otherwise a Google crawler might be wasting time looking around your site and finding mostly empty pages.
Ultimately, Google will ignore useless pages and it won’t hurt your site overall. But, this will make it take much longer for the crawler to get through your sitemap.
I switched my site from HTTP to HTTPS, do I need a new sitemap?
Yes. Even though every link may be exactly the same except the ‘S‘, in the HTTPS you will need to submit a new sitemap. The change over to HTTPS effectively makes every url different and this should be indicated to Google.
Whenever any site begins to use HTTPS instead of HTTP, the search engine will have to in effect have to start over. Don’t be alarmed if there is a drop in traffic as the index resets.
Please this checklist about an HTTP to HTTPS migration to make sure you have dealt with all the SEO issues that crop up during the process of using an SSL certificate.
Should I keep my old sitemap if I have made URL structure changes?
Google said there were errors in my sitemap. How do I fix these?
There are a number of problems that may have occurred. Before submitting a new sitemap, use the test function to make sure the sitemap structure is good. If that is fine, then an error may have occurred from one of these issues:
- Site was down or offline while the crawler was passing
- Site was slow and the crawler timed out.
- Security settings prevented the bot from reaching specific pages. HTTP Authentication is a common issue.
- Pages may no longer exist (which they should if they are in a sitemap.)
- Mobile sites that restrict access to mobile only clients.
You may want to examine your site’s access and error logs for Googlebot queries. You can easily identify this by looking at the
User-Agent string in your logs.
Overall, the submission of a sitemap to a search engine should not be that difficult a task, especially if you have a plugin or addon that is doing it for you.
However, if you are having any issues with your sitemap or indexing, I would be happy to help. Just leave a comment below or send me an email I will do my best to sort through the problem.