On the back of Matt Cutts releasing a new webmaster video listing the top 5 SEO mistakes webmasters make, I thought I would elaborate on the first point Matt talks about in his video.
The first mistake Matt talks about is probably the biggest and most common SEO mistake out there, making a website un-crawlable to search engines.
We at Boyd Digital have come across a lot of websites in our time that for one reason or another could not be crawled by search engine bots.
The most common mistakes are relatively easy for the experienced search engine optimiser to spot however, we understand these common mistakes can easily slip through the net for webmasters and developers who specialise in management and development, and not SEO.
That’s why we are always glad to help webmasters identify the problems when their traffic volumes plummet.
Below are the most recent scenarios we have come across in the wonderful world of SEO.
Blocked by robots.txt File
Search engines blocked by robots.txt file. This is probably the most common issue in regards to search engines being blocked from crawling websites.
This usually happens when a new website goes live. When developers are in the process of building a new website they intentionally block all bots from crawling the test website (new website before it goes live).
When the new website gets launched without an SEO consultant or SEO expert directly involved in the migration and launch, more often than not the robots.txt file stays the same, blocking search bots from crawling the website.
What the robots.txt file looks like when it is set up to block all bots
What the file looks like when it allows all bots to crawl all sections of a website
Blocked by Canonical Tags
Search engines blocked from crawling deep pages of a website due to incorrect use of canonical tags.
Canonical tags are mainly used to eradicate duplicate content issues on websites.
If Google finds pages with identical content they may only index and display one version in search results.
Their algorithms select the page they think best answers the user’s query. This does not always result in the most appropriate page being selected by Google hence the use of canonical tags.
Webmasters can specify a canonical page to search engines by adding a <link> element with the attribute rel=”canonical”.
This basically informs Google there may be duplicate versions of the page in question and Google has to ignore all subpages containing the tag and focus on indexing and ranking the parent page the canonical tag points to.
For example, many websites we come across have four versions of the homepage.
Google sees each of the pages above as individual pages. By inserting the canonical tag in the head section of the root domain:
<link rel=”canonical” href=”http://www.example.com”/>
Informs Google to ignore the duplicate versions and focus on indexing and ranking the root domain page in Google.
The same rules apply to the deep pages of websites.
Canonical tags are great in eradicating duplicate content issues however, improper use of canonical tags can seriously harm a website’s visibility in Google.
In this instance, we identified the incorrect use of canonical tags. A developer mistakenly inserted a canonical tag pointing back to the root domain URL in the footer area of the website.
By placing this information in the footer area of the website resulted in the code being present on all pages of the website. This instructed Google to ignore all deep pages. As a result, all deep pages vanished from Google SERPS, hundreds of generic keyword rankings were lost as were thousands of visitors per day.
Unfortunately, the client waited three months before outsourcing the investigation work to Boyd Digital. Within 15 minutes of us taking on the work, the issue was identified and rectified.
Blocked by 302 Redirects
This issue is a first for me and one of the most unusual instances I have come across in regards to search engines being blocked by search engines.
The site in question operates in the UK alcohol industry and operates the ineffective age verification process required not by law, but by the voluntary industry body called the European Forum for Responsible Drinking.
How the age verification page works
- Enter any page on a participating website
- An age verification slash page/pop up box will appear
- Type in your date of birth
- If you are of drinking age bobs your uncle
- You may now enter the alcohol website
In this instance, instead of adding a pop-up function like most alcohol websites, a 302 temporary redirect was placed on each page of the website. The redirect took all visitors to the age verification page. Once users credentials were entered they would then be allowed access to the website.
This works well from a user perspective, catastrophic from an SEO perspective. Each time a search bot tries to enter any given page on the website it is redirected to the age verification page.
Search bots are very clever but they are not human savvy. In the eyes of search bots, the only page that existed on the website was the age verification page.
When we came across the redirect issue and informed the client they were none the wiser. The website had been like that for years resulting in untold amounts of paying clients being unable to find the website.
The web crawling issues we came across can be embarrassing for people involved in developing or managing websites.
We have come across serious players in the industry who have failed to spot the schoolboy errors mentioned above and what’s even more worrying, it took weeks, months and even years for the crawling issues above to be identified and rectified.
This never goes down well with clients who invest tens of thousands of pounds on new and improved cash cows, only for traffic and cash flows to fall through the floor while people scratch their heads wondering what’s happened.
Our advice to webmasters and agencies involved in managing websites or launching new websites is to hire an experienced SEO company or SEO consultant to help manage the SEO aspects of existing websites or to help manage the migration process from old to new websites.
Don’t get caught with your pants down!