
CraigHardy
Member
- Joined
- May 11, 2021
- Messages
- 226
- Reaction Score
- 2
- Points
- 18
- #1
Index bloat is bad on so many levels. Unless you're a webmaster who dwells on these kinds of things, you may not even know your website has got a problem. You most likely launched your site a while back and have been adding to it ever since. It began to rank for some keywords a few months to a few years later and you've been coasting ever since. Did you know that your site may have the potential to rank much, much better than it currently does? Do you know that you may have an issue we in the webmaster and SEO world refer to index bloat?
What is index bloat, anyway? I'll explain it this way. The index I'm referring to here is Google's. It can be Bing's as well as any other search engine's, but for now, let's focus on Google, since that's really the only one that matters. If your website consists of a homepage and four other pages, then you've got a five page site. All Google should know about and have indexed are those five pages. Let's say though that you've also got some additional pages on the site that you don't want to count. They don't really matter to search engines, but they help the website operate the way it's supposed to. Every page except for the homepage sorts five different ways. Right there, you've got 20 extra pages. Now let's say that you've got some additional pages that 301 redirect to those four pages. Okay, add on eight more there. Now let's say that you've got a contact page that spins off some additional page every time someone uses it. There's an endless number of pages there. Basically, what I'm trying to say is that, depending on which content management system you're using, you can have a huge number of pages beyond those you currently know about that are polluting Google's index. You may not be aware of this, but these extra pages are dragging down your entire site's ranking.
I'd like to get something out of the way right now. If you block pages in the robots.txt file, Google won't index them. Or rather, they'll index them, but they may fall out of the index in a few months to years. Eventually, they'll disappear, unless they're linked to from a prominent position on the site. The thing is, there's no guarantee that they'll disappear and it's a gamble keeping these pages around. It's not good practice. The same is true for pages that have a noindex meta tag on them. And the same is true for 301 redirects. And the same is true for pages with the canonical tag on them. And the same is true for thin and junk pages. All of these types of pages need to go. They need to disappear from your site in order to regain your rankings or allow them to flourish in the first place.
I've been working in SEO for over a decade and I can't tell you how many times I've seen a supposed SEO professional tell a client that they need to place a noindex meta tag on a thin page in order to remove it from Google's index. This is complete garbage advice. The truth is, if you'd like to remove a page from Google's index, you should delete the page or block it via authentication. Also, you can forget about the canonical tag and 301 redirects for the sake of search engines. They don't need them. I've seen pages consolidate into one within minutes of creation automatically and I've seen pages that were supposed to merge via a 301 redirect never redirect - even after 10 years. Yes, use 301 redirects, but only for users. Don't count on Google or the other search engines obeying them at all. Again, if you've got these types of redirects, you'll need to change your website's architecture so search engines can't see them. If you don't, you'll have a bad case of duplicate content and you don't want that.
If you're running Wordpress, the big sources of bloat are author pages, tag pages, and those pages inside of the /page/ directory. These are the ones that are linked to from the bottom of the homepage. 1, 2, 3, and so on. There are plugins that remove the /page/ directory pages and the author pages. In regards to the tag pages, don't use tags at all. You'll only get yourself in trouble. And as for those individual image pages that get spun off from every image you place in your posts, there are plugins to deal with them as well.
My big point of this post is this: in order to reduce index bloat, you'll need to delete pages. And to do this, you'll need to alter your website's architecture. This isn't an easy task, but it's necessary. In order to find out if you've got index bloat or not, you can use of the many SEO services out there to analyze your site (Moz, Botify, SEMrush, etc...). You can also do a scan yourself with applications such as Xenu Link Sleuth. This is a very handy and free program that I've used a lot in the past. It crawls your site and makes you aware of things you never thought you'd see. Only after you have all the necessary information will you be able to determine the best course of action.
If you've got any questions about any of this, please ask. As I said, I've got lots of experience so I may be able to help out.
What is index bloat, anyway? I'll explain it this way. The index I'm referring to here is Google's. It can be Bing's as well as any other search engine's, but for now, let's focus on Google, since that's really the only one that matters. If your website consists of a homepage and four other pages, then you've got a five page site. All Google should know about and have indexed are those five pages. Let's say though that you've also got some additional pages on the site that you don't want to count. They don't really matter to search engines, but they help the website operate the way it's supposed to. Every page except for the homepage sorts five different ways. Right there, you've got 20 extra pages. Now let's say that you've got some additional pages that 301 redirect to those four pages. Okay, add on eight more there. Now let's say that you've got a contact page that spins off some additional page every time someone uses it. There's an endless number of pages there. Basically, what I'm trying to say is that, depending on which content management system you're using, you can have a huge number of pages beyond those you currently know about that are polluting Google's index. You may not be aware of this, but these extra pages are dragging down your entire site's ranking.
I'd like to get something out of the way right now. If you block pages in the robots.txt file, Google won't index them. Or rather, they'll index them, but they may fall out of the index in a few months to years. Eventually, they'll disappear, unless they're linked to from a prominent position on the site. The thing is, there's no guarantee that they'll disappear and it's a gamble keeping these pages around. It's not good practice. The same is true for pages that have a noindex meta tag on them. And the same is true for 301 redirects. And the same is true for pages with the canonical tag on them. And the same is true for thin and junk pages. All of these types of pages need to go. They need to disappear from your site in order to regain your rankings or allow them to flourish in the first place.
I've been working in SEO for over a decade and I can't tell you how many times I've seen a supposed SEO professional tell a client that they need to place a noindex meta tag on a thin page in order to remove it from Google's index. This is complete garbage advice. The truth is, if you'd like to remove a page from Google's index, you should delete the page or block it via authentication. Also, you can forget about the canonical tag and 301 redirects for the sake of search engines. They don't need them. I've seen pages consolidate into one within minutes of creation automatically and I've seen pages that were supposed to merge via a 301 redirect never redirect - even after 10 years. Yes, use 301 redirects, but only for users. Don't count on Google or the other search engines obeying them at all. Again, if you've got these types of redirects, you'll need to change your website's architecture so search engines can't see them. If you don't, you'll have a bad case of duplicate content and you don't want that.
If you're running Wordpress, the big sources of bloat are author pages, tag pages, and those pages inside of the /page/ directory. These are the ones that are linked to from the bottom of the homepage. 1, 2, 3, and so on. There are plugins that remove the /page/ directory pages and the author pages. In regards to the tag pages, don't use tags at all. You'll only get yourself in trouble. And as for those individual image pages that get spun off from every image you place in your posts, there are plugins to deal with them as well.
My big point of this post is this: in order to reduce index bloat, you'll need to delete pages. And to do this, you'll need to alter your website's architecture. This isn't an easy task, but it's necessary. In order to find out if you've got index bloat or not, you can use of the many SEO services out there to analyze your site (Moz, Botify, SEMrush, etc...). You can also do a scan yourself with applications such as Xenu Link Sleuth. This is a very handy and free program that I've used a lot in the past. It crawls your site and makes you aware of things you never thought you'd see. Only after you have all the necessary information will you be able to determine the best course of action.
If you've got any questions about any of this, please ask. As I said, I've got lots of experience so I may be able to help out.