Search

XenForo Orphan Pages

  • Thread starter 15Katey
  • Start date
15Katey

15Katey

Member
Joined
May 10, 2021
Messages
130
Reaction Score
0
Points
23
  • #1
If you haven't yet read this post about XenForo 301 redirects, I encourage you to do so. It talks about the problems the software faces due to a whole bunch of redirected pages. Now, I know that these pages are useful for many of the features the software offers, so I don't want you to think that I'm suggesting we remove them altogether. What I am suggesting though is that we hide the redirects from website visitors who aren't registered for accounts and who aren't logged in to those accounts. In addition, many of these features aren't really necessary for those who are simply browsing. If a visitor is truly into the forum and hangs around it a lot, they'll register to become a member and then they'll experience what the site has to offer in all its glory.

Here's the situation. Let's say the owner of a XenForo community website creates ten primary forums on their site. Then, under those forums, the owner creates ten sub-forums for each primary forum. They allow threads to be posted to both the primary and sub forums. Pretty simple, right? It's like having ten categories and then ten sub-categories under each category. If this was a classifieds website, the users would be allowed to post their ads under the main categories as well as the subs.

The way XenForo is set up is that it shows the latest post in a thread that was posted to any given forum. So as someone is browsing all these forums that were set up, they're also browsing links to posts. Posts are contained inside of threads, if you aren't aware. So as it stands, we've got 110 forums and 110 links to the threads contained within those forums. This is all fine, but there is one oddity. The links that go to the latest posts inside of those threads aren't actually direct links. They're 301 redirects. On the surface, this seems okay, but in reality, it's not. It's pretty bad and I'll tell you why.

As I stated above, the links that accompany these forums change when a newer thread is posted. So on any given day, if a whole bunch of posts were made, those original 110 links might not actually be linked to in the same way. The older ones will get pushed down and only the latest ones will get linked to next to the forums. And once the links gets pushed down, the redirect isn't linked to anywhere else on the website. In effect, that 301 redirect link becomes an orphan link. I know what you're thinking. You're probably saying, "Who cares? This isn't a big deal." Well, I'll tell you why I think it is a big deal. It's because of Google. You see, Google crawls all of these 301 redirected links and for some reason, it only honors the redirect and canonicalizes both links to the proper thread link some of the time. For others, Google completely ignores the actual proper thread link and never indexes it. So once that link gets pushed out and isn't linked to anymore, it's known by Google, has no pagerank flowing to it, and drags the entire site down in rankings. And beyond that, since the pagerank of these pages is so low, they'll never rank for anything. And also, because the proper link isn't even in the Google index, that page won't rank either. So it's like a double whammy. I've got a site right now with about 9,000 threads. Over 1,000 of them have only the redirect pointing to them and their proper URL is nowhere to be seen, even though it's being linked to. That's not good. And I've seen this all over the internet when it comes to XenForo sites.

If you aren't sure what I'm referring to when I talk about these thread links, go to the homepage of this website. Look at the forum links, such as Economics Forum and Outdoor Forum and then look to the right of these forum links (on a desktop). You'll see the thread links I'm talking about. Now, if you are a member of this site and are logged in and if you roll your mouse over these thread links, you'll see that there's a /post-xxxx at the end of the URL. If you click on one of these links, you'll see that it redirects to another one. If you aren't a member and you roll over these links, you won't see the redirect URL (this may have changed at the time of this writing). This is because I have already fixed the problem.

So what I'm trying to say here is that this software creates a heck of a lot of orphan URLs that may be pulling the rankings of people's websites down. Orphan pages are never a good thing, especially when Google and the other search engines know about them. Their lousy non-existent pagerank gets calculated into the entire site's and the ranking of the site suffers.

If you'd like to read about orphan pages, click through the link below:

https://www.botify.com/blog/orphan-pages

And to read Google's view on how 301 redirects work, according to John Mueller, check this out:

https://www.searchenginejournal.com/google-301-redirects-canonical/363652/
 
EmeraldHike

EmeraldHike

Member
Joined
May 10, 2021
Messages
133
Reaction Score
0
Points
21
  • #2
On my XenForo website, I have my forum list show on the homepage. To the right of each forum title there is the latest post that was placed in that forum. Every one of these posts ends with a /post-xxxx in the URL. These links aren't the direct links to the threads. They're links to the most recent post in the thread, so at the end of the thread URL, that /post-xxxx turns into a /#post-xxxx. The hashtag is fine because Google treats that as the canonical URL. The problem is that all of the /post-xxxx URLS use a 301 redirect to get to the actual canonical thread page. Google mostly doesn't canonicalize that redirect to the thread page. It keeps it in its index with the /post-xxxx ending in the URL. That means that the real canonical thread page is treated as duplicate content. This is a huge problem.

Another huge problem is that once someone makes a new post in that same forum, the previous "newest" post disappears from the homepage. That previous 301 redirected URL that was just linked to now has no link to it and it becomes an orphaned page with no pagerank flow to it. So to compound the problem of having the real thread page being considered duplicate and not even in the index, the indexed 301 redirected page has no links to it and no longer ranks for anything. And since Google doesn't like these redirected pages at all, it doesn't come back to recrawl them to merge them with the real thread page. It's not a good scene. I may block the 301 redirects in the robots.txt file with:

Disallow: /threads/*/post
Disallow: /threads/*/latest

Some people say to leave these 301 redirect URLs unblocked to help with internal meshing, but I think they do more harm than good when crawled by the search engines.
 
15Katey

15Katey

Member
Joined
May 10, 2021
Messages
130
Reaction Score
0
Points
23
  • #3
I've been thinking about this situation and wonder if I'm missing out on some critical information. I initially thought that once the URL that Google has chosen as the canonical isn't linked to anymore, the content on that page would be considered "orphaned." After much thought, I'm beginning to think it doesn't matter what the canonical URL is. Let me explain.

If we think of a URL as merely a unique identifier, then it really doesn't matter what is linked to that identifier, if it's consolidated with other URLs (identifiers). So basically, if I have a URL such as:

example-page-redirect.html

that's linked to from the homepage, but it's really a redirect that points to:

example-page-canonical.html

that's linked to from all over the website, what does it matter which of these URLs Google thinks is the canonical? Let's say that Google chooses the redirect URL as the canonical, essentially disobeying the 301 redirect attached to the redirect URL and the canonical tag on the real canonical page. I initially though that this would be a terrible thing and would cut off all pagerank flow to the canonical page. But really, what may be occurring is that Google simply chose one of the URLs to show as the canonical page, but is still getting the pagerank flow from the other URL (the canonical one) that's linked to from all over the website. Does it matter which URL Google shows as the canonical? I don't think so. I'm no database or search engine expert, but I'll tell you that every time I try to "fix" something that I deem wrong or incorrect, I end up losing rankings. It's almost like I'm doing things backwards. I'm still trying to make sense of what I just attempted to explain above, so if it seems not well thought out, then please forgive me.

And please, if you've got any insight into this, please chime in and let me know. Thank you.
 
JGaulard

JGaulard

Administrator
Staff member
Site Supporter
Sr. Site Supporter
Power User
Joined
May 5, 2021
Messages
319
Reaction Score
2
Points
18
  • #4
15Katey said:
I've been thinking about this situation and wonder if I'm missing out on some critical information. I initially thought that once the URL that Google has chosen as the canonical isn't linked to anymore, the content on that page would be considered "orphaned." After much thought, I'm beginning to think it doesn't matter what the canonical URL is. Let me explain.

If we think of a URL as merely a unique identifier, then it really doesn't matter what is linked to that identifier, if it's consolidated with other URLs (identifiers). So basically, if I have a URL such as:

example-page-redirect.html

that's linked to from the homepage, but it's really a redirect that points to:

example-page-canonical.html

that's linked to from all over the website, what does it matter which of these URLs Google thinks is the canonical? Let's say that Google chooses the redirect URL as the canonical, essentially disobeying the 301 redirect attached to the redirect URL and the canonical tag on the real canonical page. I initially though that this would be a terrible thing and would cut off all pagerank flow to the canonical page. But really, what may be occurring is that Google simply chose one of the URLs to show as the canonical page, but is still getting the pagerank flow from the other URL (the canonical one) that's linked to from all over the website. Does it matter which URL Google shows as the canonical? I don't think so. I'm no database or search engine expert, but I'll tell you that every time I try to "fix" something that I deem wrong or incorrect, I end up losing rankings. It's almost like I'm doing things backwards. I'm still trying to make sense of what I just attempted to explain above, so if it seems not well thought out, then please forgive me.

And please, if you've got any insight into this, please chime in and let me know. Thank you.
I think it may be best to simply block these types of redirected URLs in the robots.txt file. At best, things may work the way you suggest, but they may not. Also, by allowing the search engines to crawl all these somewhat useless redirects, you're wasting valuable crawl budget. I'd personally prefer Google to crawl my direct and valuable links, not redirects.

UPDATE

I'm actually doing some testing right now. I'd like to see if unlinking these 301 redirect links and allowing them to be crawled has any effect on crawl budget. What I'd like to happen is to have these redirects fold into their canonical counterparts and for the crawling of them to slow down because they're not linked to anymore. I'll report back here when I have some findings to share.
 
Top