Technical SEO Case Study: Why Do Website Rankings Keep Dropping?

Let me guess – this is your situation: Your website experienced a ranking drop and you made some changes to it in an effort to raise them back up again. The problem is, they kept on dropping. Lower and lower and lower. It seems like everything you’ve done to get the rebound you’re after has failed. So you keep on trying. And everything you do seems to make your situation worse. I feel your pain. Why? Because I’ve been there a thousand times. I’m with you.

I’m here today to give you some advice. The advice is to stay away from your website for a while. Any change you make to your site in an effort to increase your search engine rankings will take months to see in regards to results. If you make a change and then wait a week and then make another one and then repeat that process over and over again, you’re digging your own grave. You need to make a rational change and then walk away. I can remember one website I owned a while back dropped in ranking. I thought I knew what it was, so I altered some settings. Well, that only lasted for a few weeks because I became impatient. So I altered them back. And I kept on messing with things and I think Google got sick of me because they stopped crawling my site for a while. Well, they didn’t stop completely, but they did slow way down. They obviously didn’t like what I was doing. Looking back, I wish I had just corrected what I perceived as the error and then left things alone. You know, go outside for a while (months) to get some fresh air. Find a hobby. Do something other than sit at the computer waiting for a bump in the search engine rankings.

Here’s the thing – when you’re up against a drop like this, you really do need to do some analysis to find out what went wrong. You can’t just go changing things willy nilly. Search engines like stability and predictability. They don’t enjoy crawling and indexing something that’s going to be changing all day long. So if you update your robots.txt file, upload it and leave it alone. Actually, when it comes to that file, you should set it once and then forget it for pretty much forever. Trust me, I know this from experience. That file is such a pain to deal with because any update to it takes forever to take effect.

Let me give you a real life example of what I’m talking about here. Call this a search engine optimization or SEO case study, if you will. I’ll keep it short.

A lot of websites allow users to register for accounts. Once the user is registered, they become a member. Classifieds, forums, ecommerce sites, and blogs sometimes allow people to register. The sites oftentimes offer names and links to these account, which are usually empty thin content pages. I’m sure you’ve heard about this. They’ve been a big problem for a long time and these pages are part of why Google Panda was released. To cut down on junk pages that are sometimes filled with spam and are sometimes left empty forever. Every so often, member pages are used the way they’re intended to be used, but that’s a rarity.

Because these member pages are so thin, webmasters rarely allow them to be crawled by search engines. Some do allow them to be crawled, but include a meta noindex tag to keep them from showing in search results. Some don’t want them crawled at all to preserve what’s referred to as “crawl budget.” They’ll block the directory in which these member pages are held. Some allow these pages to be crawled, but will require authentication to view them, meaning, the person (or crawler) will need to first log into their own account to progress further. Those authentication pages sometimes return a header of 403 Forbidden. This means, “Don’t go any further. You’re not allowed.”

Whichever method a webmaster chooses is better than allowing an unhindered crawl. The last thing anyone wants is to have these pages in a search engine’s index because they rarely help and almost always hurt. But here’s the problem – the “case” if you will. Let’s say that a webmaster launches a new classifieds website. Or a forum. It doesn’t really matter. The website has links all over its pages that lead to these member accounts. In the beginning, the webmaster had these pages set so they return the 403 Forbidden status code. When the links were crawled by the search engines, none of them were indexed because there was nothing there to see. The search engine couldn’t progress past the forbidden part. Months went by and the webmaster became impatient and wanted to see his or her website rank better. For some reason, they thought the 403 status codes were bad, so they decided to block the member directory in the robots.txt file. Months continued to go by and the number of pages that were blocked grew to the thousands. After all, there were many members. So what went from nothing grew to actual URLs that were noticed and indexed by Google and the other search engines. As you may be aware, having too many pages blocked in this file is no good. Google doesn’t like that. Why? Well that’s a story for another time. Just ask if you’re interested.

After a few months of fairly static rankings, the website experienced a ranking drop. The reason for this, as the webmaster concluded, was because the number of pages that were blocked far outnumbered the number of pages that were actually indexed. Yes, it’s true that pages that are blocked by the robots.txt file will eventually drop from the Google index, but that can sometimes take a very long time. In the meantime, Google will consider each member URL it crawls as an individual and live page. The shame about this type of situation is that the webmaster of a site like this will oftentimes venture out onto the internet and in SEO forums and ask all sorts of questions. They’ll wonder why their rankings dropped and people will tell them that it’s their content that’s bad. That it’s duplicate or it’s thin. The actuality of this is that the content is fine and it’s a technical SEO issue. The problem was because those member pages were blocked in the robots.txt file when they shouldn’t have been. They should have been left alone to return the 403 Forbidden status code and the webmaster should simply have been patient while his or her site was growing through its beginning stages.

Be that as it may, the pages were blocked and the problem was discovered. So the webmaster in question reversed course and unblocked the pages so the search engines could once again crawl to see the error code and drop the pages from their indexes.

But wait – what happened when the crawling began? The website’s rankings dropped even more. Why? Well, think about it. Those blocked pages had gained pagerank while they were being “crawled” or noticed. Not much pagerank, but enough to reduce the rankings of the entire site if all of the pages were “deleted” all at once. And herein lies the conundrum; how do you get rid of a bunch of lousy pages when doing so makes the negative effects you’ve been experiencing even worse? In this case, unblocking the pages so they’re removed from the search engine’s indexes was the right move to make. What’s called for here is patience and intestinal fortitude. Things will undoubtedly get worse before they get better, but get better they will. The common course of action when webmasters notice the negative effects of their most recent “correction” is to undo that correction and a cycle begins. Ranking fall even more and the webmaster does something to harm them even more than that. Then they do and undo things for months until the website becomes worthless. This happens all the time. Remember, fewer pages in Google’s index are better, especially when it comes to system generated pages. Contact, printer friendly, image only, newest ads or posts, member pages – all of these pages are thin and in no way should be included in the indexes of search engines. The trouble lies with trying to remove them. It hurts, but it’s got to be done. So if you own or manage a website that’s experienced a ranking drop, don’t exacerbate the problem. Stop and think about what’s going on. Then, don’t just do blocking pages and directories in the robots.txt file. If you can, make it so the search engines believe the infringing pages are deleted. They need to be removed from the index completely, even if they aren’t actually removed. Hide them behind authentication. Get rid of them. Do something, but don’t flip back and forth between blocking them and not blocking them. That’s no good.

Do you have any SEO related questions? If so, share them down below and ask away. I’ll be happy to answer. Thanks!

Kody

December 30, 2022 at 10:57 pm

I have an alternate view and a theory about falling rankings. You mentioned that webmasters should allow pages to return 403 (authentication) header response codes as opposed to blocking those pages in robots.txt. Let’s see what Google says about this:

——

Hide URLs that you don’t want in search results

Wasting server resources on unnecessary pages can reduce crawl activity from pages that are important to you, which may cause a significant delay in discovering great new or updated content on a site.

Exposing many URLs on your site that you don’t want crawled by Search can negatively affect a site’s crawling and indexing. Typically these URLs fall into the following categories:

– Faceted navigation and session identifiers: Faceted navigation is typically duplicate content from the site; session identifiers and other URL parameters that simply sort or filter the page don’t provide new content. Use robots.txt to block faceted navigation pages. If you find that Google is crawling a significant number of essentially duplicate URLs with different parameters on your site, consider blocking parameterized duplicate content.

– Duplicate content: Help Google identify duplicate content to avoid unnecessary crawling.

– Soft 404 pages: Return a 404 code when a page no longer exists.

– Hacked pages: Be sure to check the Security Issues report and fix or remove any hacked pages you find.

– Infinite spaces and proxies: Block these from crawling with robots.txt.

– Low quality and spam content: Good to avoid, obviously.

– Shopping cart pages, infinite scrolling pages, and pages that perform an action (such as “sign up” or “buy now” pages).

Do:

– Use robots.txt if you don’t want Google to crawl a resource or page at all.

– If a common resource is reused on multiple pages (such as a shared image or JavaScript file), reference the resource from the same URL in each page, so that Google can cache and reuse the same resource without needing to request the same resource multiple times.

Avoid:

– Don’t add or remove pages or directories from robots.txt regularly as a way of reallocating crawl budget for your site. Use robots.txt only for pages or resources that you don’t want to appear on Google for the long run.

– Don’t rotate sitemaps or use other temporary hiding mechanisms to reallocate budget.

——

On another one of Google’s documents (that I can’t located right now), I read that Googlebot can never authenticate a page, so it shouldn’t encounter a 403 response – ever. I’ll post that paragraph in this thread if I locate it in the future. So really, the right thing to do with pages that require authentication and that return 403 responses is to block them in the robots.txt file.

My Theory for Website Ranking Loss

I came up with this just a few days ago and it makes a lot of sense. If you’ve got a large, or even not so large, website and that website is slowly and steadily losing its search traffic, I may have discovered an issue you may be facing. Hopefully, this will make sense.

Let’s say you’ve got a large website – a forum, classifieds site, blog, ecommerce web store – whatever. As long as you’ve got a lot of user generated content that the search engines may consider thin, what I’m about to tell you may apply. You can even have lots of dead pages that you recently removed. Pages that return 404 response codes. Or pages that return 401 or 403 responses. Also, you’ll likely need lots of paginated content, meaning on your category pages, you’ve got number links that lead to page 2, 3, 4, etc… This is all fairly standard stuff for large sites.

I think we need to agree on a premise before I begin. The premise I propose says that if Googlebot doesn’t crawl a web page in a certain amount of time, that page will drop from Google’s index automatically. In order for a page to remain in the index, it needs to be linked to and crawled periodically. So if Googlebot crawled one of your website’s pages three years ago and Google added the page to its index then, but if over the past three years, you stopped linking to that page and Googlebot hasn’t returned to it, the page will disappear from Google.

The Situation

Let’s say you own a blog that has 500 posts on it. Those posts are listed in one category that links to 10 posts on each page. So you’ve got 50 paginated pages that link to 10 posts each. Now let’s say that your website also has lots of junk content. There are thin member pages with very little content on them, there are tons of old pages you recently deleted that now return 404 responses, and there are many pages that require authentication. These return 403 response codes. Your website also has pages with duplicate content on them and it’s full of 301 redirects. It’s not a good scene.

When a website is in the condition I described above and when Googlebot is allowed to crawl all those lousy pages, Googlebot will dramatically reduce its crawl rate. Basically, it says, “This website isn’t what we’re looking for. We’ll slow down our crawling until it gets its act together.” When the crawl rate goes down, one of the first types of pages to stop being crawled is paginated pages. So, if Googlebot is mostly crawling poor unnecessary content and if it’s reduced its crawl rate and is not crawling paginated pages anymore, what happens to all those posts that are linked to from the paginated pages? Well, as I explained in my premise above, the posts will drop from Google’s index over time. And many of the paginated pages will as well. Googlebot simply won’t see those pages because it’s busy crawling garbage. That’s why, when you look in your Google Search Console at the Valid Pages graph, it’ll be slowly getting lower and lower as the weeks and months pass. At the same time, your Crawled but Not Indexed and Discovered but Not Indexed graphs will be climbing. Those graphs show the pages that were once crawled and perhaps indexed, but have dropped from the index. Basically, Google knows about them but let them fall into the abyss due to the condition of the rest of your website.

What can you do about this? Block the lousy pages in your robots.txt file. Look at your log files and get blocking. Block anything you don’t want Google to include in its index or show in the search results. And then wait months and months and months for those pages to drop from Google’s awareness all together. I’ve done this in the past and after a long time, my websites climbed, for example, from 200 pageviews per day to 10,000. It’s absolutely insane how low one’s traffic can fall, all because of junk content.

Let me know your thoughts.

Comments

Kody

December 30, 2022 at 10:57 pm

I have an alternate view and a theory about falling rankings. You mentioned that webmasters should allow pages to return 403 (authentication) header response codes as opposed to blocking those pages in robots.txt. Let’s see what Google says about this:

——

Hide URLs that you don’t want in search results

Wasting server resources on unnecessary pages can reduce crawl activity from pages that are important to you, which may cause a significant delay in discovering great new or updated content on a site.

Exposing many URLs on your site that you don’t want crawled by Search can negatively affect a site’s crawling and indexing. Typically these URLs fall into the following categories:

– Faceted navigation and session identifiers: Faceted navigation is typically duplicate content from the site; session identifiers and other URL parameters that simply sort or filter the page don’t provide new content. Use robots.txt to block faceted navigation pages. If you find that Google is crawling a significant number of essentially duplicate URLs with different parameters on your site, consider blocking parameterized duplicate content.

– Duplicate content: Help Google identify duplicate content to avoid unnecessary crawling.

– Soft 404 pages: Return a 404 code when a page no longer exists.

– Hacked pages: Be sure to check the Security Issues report and fix or remove any hacked pages you find.

– Infinite spaces and proxies: Block these from crawling with robots.txt.

– Low quality and spam content: Good to avoid, obviously.

– Shopping cart pages, infinite scrolling pages, and pages that perform an action (such as “sign up” or “buy now” pages).

Do:

– Use robots.txt if you don’t want Google to crawl a resource or page at all.

– If a common resource is reused on multiple pages (such as a shared image or JavaScript file), reference the resource from the same URL in each page, so that Google can cache and reuse the same resource without needing to request the same resource multiple times.

Avoid:

– Don’t add or remove pages or directories from robots.txt regularly as a way of reallocating crawl budget for your site. Use robots.txt only for pages or resources that you don’t want to appear on Google for the long run.

– Don’t rotate sitemaps or use other temporary hiding mechanisms to reallocate budget.

——

On another one of Google’s documents (that I can’t located right now), I read that Googlebot can never authenticate a page, so it shouldn’t encounter a 403 response – ever. I’ll post that paragraph in this thread if I locate it in the future. So really, the right thing to do with pages that require authentication and that return 403 responses is to block them in the robots.txt file.

My Theory for Website Ranking Loss

I came up with this just a few days ago and it makes a lot of sense. If you’ve got a large, or even not so large, website and that website is slowly and steadily losing its search traffic, I may have discovered an issue you may be facing. Hopefully, this will make sense.

Let’s say you’ve got a large website – a forum, classifieds site, blog, ecommerce web store – whatever. As long as you’ve got a lot of user generated content that the search engines may consider thin, what I’m about to tell you may apply. You can even have lots of dead pages that you recently removed. Pages that return 404 response codes. Or pages that return 401 or 403 responses. Also, you’ll likely need lots of paginated content, meaning on your category pages, you’ve got number links that lead to page 2, 3, 4, etc… This is all fairly standard stuff for large sites.

I think we need to agree on a premise before I begin. The premise I propose says that if Googlebot doesn’t crawl a web page in a certain amount of time, that page will drop from Google’s index automatically. In order for a page to remain in the index, it needs to be linked to and crawled periodically. So if Googlebot crawled one of your website’s pages three years ago and Google added the page to its index then, but if over the past three years, you stopped linking to that page and Googlebot hasn’t returned to it, the page will disappear from Google.

The Situation

Let’s say you own a blog that has 500 posts on it. Those posts are listed in one category that links to 10 posts on each page. So you’ve got 50 paginated pages that link to 10 posts each. Now let’s say that your website also has lots of junk content. There are thin member pages with very little content on them, there are tons of old pages you recently deleted that now return 404 responses, and there are many pages that require authentication. These return 403 response codes. Your website also has pages with duplicate content on them and it’s full of 301 redirects. It’s not a good scene.

When a website is in the condition I described above and when Googlebot is allowed to crawl all those lousy pages, Googlebot will dramatically reduce its crawl rate. Basically, it says, “This website isn’t what we’re looking for. We’ll slow down our crawling until it gets its act together.” When the crawl rate goes down, one of the first types of pages to stop being crawled is paginated pages. So, if Googlebot is mostly crawling poor unnecessary content and if it’s reduced its crawl rate and is not crawling paginated pages anymore, what happens to all those posts that are linked to from the paginated pages? Well, as I explained in my premise above, the posts will drop from Google’s index over time. And many of the paginated pages will as well. Googlebot simply won’t see those pages because it’s busy crawling garbage. That’s why, when you look in your Google Search Console at the Valid Pages graph, it’ll be slowly getting lower and lower as the weeks and months pass. At the same time, your Crawled but Not Indexed and Discovered but Not Indexed graphs will be climbing. Those graphs show the pages that were once crawled and perhaps indexed, but have dropped from the index. Basically, Google knows about them but let them fall into the abyss due to the condition of the rest of your website.

What can you do about this? Block the lousy pages in your robots.txt file. Look at your log files and get blocking. Block anything you don’t want Google to include in its index or show in the search results. And then wait months and months and months for those pages to drop from Google’s awareness all together. I’ve done this in the past and after a long time, my websites climbed, for example, from 200 pageviews per day to 10,000. It’s absolutely insane how low one’s traffic can fall, all because of junk content.

Let me know your thoughts.

Related posts:

Reader Interactions

Comments

Leave a Reply Cancel reply