Last week I got an email from a well-known garments brand saying that they were suffering from a major issue with their new links not being appearing in SERP. Obviously I had to look deeper into the matter to come up with the answer to solve this problem. I gave them a reply mentioning that it will take some time investigating the issue and studying the site deeply. When Google lacks to take interest in your site indexation this means that no one will be able to find the latest site information.
It took a week identifying the problem and at last came up with the conclusion. Now I am going to share what I do to investigate the issues with sites.
The first step I normally take is by typing site:yousitename.com (excluding www) into google.com search bar. Check whether the number of results correspond with the number of pages your site currently have. Is it a big difference or a small difference between the results shown by the search engine and the amount of existing pages? Also keep in mind that the results which Google shows are not always the exact amount. So consider this simple method to extract the idea of the indexing activity of your site. If the result amount nearly matches with the current amount of your pages then it’s a good sign that you are being loved by Google indexing Robot.
The next step I take is to check the Google Webmaster Tools dashboard. If there are issues with your site and Google sees them, then this page will show some error messages. The following screenshot shows how the error messages look like (randomly picked from Google Images because I have no issues with my site)
The most seen error is the 404 HTTP Status code, this code tells that the link is broken and leads to nothing (page cannot be found). This is a bad user experience which not only annoys the user and also the search engine crawler. All error codes other than 200 and 301 means that there are a lot of thing you need to fix at your end.
How to Fix Crawling Errors
1. Robot.txt – this file is located in the server root folder. Why this file is so important? Just because it allows or disallows the search engine crawlers from crawling and indexing the site. Most hackers use the trick to stop the site from being indexing, and why they do that? Just because they don’t want your site to compete with theirs in the SERP. You must disable the file rewrite rules for robot.txt and make sure it is read only. I recommend you to find and check this line in your robot.txt User-agent: * Disallow:/ this means that the search engine is not allowed to crawl and index your site. Make sure that it is set to Allow
2. .htaccess – this file is located in WWW or public_html folder. Check for any bad configuration like infinite loops.
3. Meta Tags – check and verify that the page(s) that are not being indexed doesn’t have the following meta tags into the source code. <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
4. Sitemaps – there is an issue with the updating of the sitemap. You think that your sitemap is being updated, and you try to resubmit the same old sitemap a number of times. Check and make sure that the sitemap updates when you update your site information. To check you need to visit the sitemap by typing www.yoursitename.com/post-sitemap.xml and see if the latest links are present.
5. Poor Google PageRank – the better the PageRank the better the crawling rate. According to Matt Cutts who is serving as a head of Google Web spam team. In an interview he said that the rate of crawl is roughly proportional to the PageRank.
6. Connectivity problems – having poor hosting services can lead to server crashes or server being frequently absent due to maintenance. Make sure that you are using quality hosting services, providing the maximum uptime.
7. Being unlucky – suppose you newly purchased a domain and you strictly follows Google guidelines. You write quality content, did great ethical SEO, got connected with excellent inbound links, but still you are being a victim of Google refusing to index your site, well it is only that the previous owner of the domain did a lot of spam that fortunately Google marked the site with Red Flag.
Other reasons your site is not being indexed is the violation of Google Webmaster Guidelines.
Therefore , now you know how to follow in with steps to investigate any indexation issues that your site might face, so move on and perform an analysis, hopefully you will spot the reason why search engines were ignoring your site.