Search

Robots.txt Blocked Pages vs. 403 Forbidden

  • Thread starter JGaulard
  • Start date
JGaulard

JGaulard

Administrator
Staff member
Site Supporter
Sr. Site Supporter
Power User
Joined
May 5, 2021
Messages
319
Reaction Score
2
Points
18
  • #1
I've been thinking about this quandary for years and I was again just last night. While driving down the road, I had to admit to myself, "I simply do not know." I don't know the answer to the question: which is more SEO friendly, to have linked to pages you don't want indexed by the search engines lead to a 403 Forbidden status code or to block them in the robots.txt file? Both options have their pros and cons. But I implore you, if you have the answer, please do let me know down below.

For the longest time, the only tool I have used to keep pages out of the Google index was the robots.txt tool. I have to tell you, it works very well. It does keep the live pages on the website from being crawled. Many of these pages will still appear if you type in the "site:yourwebsite.com" command in Google, but those pages haven't actually been crawled. They've been recognized and that's about it. My question is, do those "recognized" pages count against the site at all? That really is the million dollar question. Say you have only one page you'd like returned in the search results on an entire website and you've got ten pages that this page links to. You block all ten pages in the robots.txt file. Will those ten pages be considered thin or anything weird like that? Or will they simply decay and fall out of existence after a while? I would love to know. From my own experience, if those pages are actively linked to, they'll hang around for a while. If they're trapped in a directory and the links to the pages can't be seen once the path to them is blocked in the robots.txt file, they'll generally fall out of the index and all of existence after a few months. It's the actively linked to pages I'm most concerned with.

I'm running Xenforo software for this forum. There are two sections I'm concerned with. One is the member pages and the other is the attachment pages. For both of these areas, I have the ability to either block them in the robots.txt file (/members/ and /attachments/) or I can block access to these pages via the site permission system. If I use the robots.txt file, Google will crawl the site, see the links, and then not crawl the pages. It'll still make a record of them and show them if I go looking. If I were to block these pages via the permission so, say, those who aren't registered members and who aren't logged in, can't see them, then the pages will still be linked to, but will return a 403 Forbidden status code for whomever accesses them. I don't like the idea of linking to dead pages, but I also don't like the idea of blocking tens of thousands of pages in the robots.txt directory (via those two directories I mentioned above).

If you were to make an educated guess, which option would you say is more SEO friendly?
 
Top