Search

XenForo: What's New Pages

  • Thread starter EmeraldHike
  • Start date
EmeraldHike

EmeraldHike

Member
Joined
May 10, 2021
Messages
133
Reaction Score
0
Points
21
  • #1
If you look at the top navigation bar on this website, you'll see a link that says What's New. If you click that link and then look up in the address bar of your browser, you'll see a URL like this:

https://gaulard.com/forum/whats-new/

That's fine. That's the URL that's supposed to be there. If you now look right below that link, you'll see a few other links. There's one that says New Posts. The URL for that link is /whats-new/posts/. There should also be a number at the end of that URL. Now here's the tricky part. If you take note of that number and then click around the site a bit and then come back to that page again, you'll see that number change. It actually changes all the time. It increases. The reason for this has something to do with how the software handles user sessions (I believe). That What's New page changes, depending on who's logged in and what's new to each user. The problem with this is, even though there's a noindex tag on the page, Google is considered a new user every time it crawls that page. So what starts off as /whats-new/posts/10/ quickly turns into /whats-new/posts/2049586/. Do you see the issue with this? Yes, the page does have the noindex tag, but Google seems to crawl those pages very aggressively. I have a XenForo site that made it all the way to 20,000 before I stopped the madness.

Since these pages obviously shouldn't be crawled by search engines, they need to be blocked in the robots.txt file of the website in question. The reason the pages shouldn't be crawled is because each and every version of the same page is considered "new" because it's got a new number attached to the end of it. They really are the same page, but since they've got unique URLs, they're considered distinct in the eyes of the search engines. And because all of these pages are distinct, but the same, they're considered duplicates. Just because they've got a noindex tag on them makes no difference. They're using up your website's bandwidth and they're also using up your website's crawl budget with Google. And beyond that, they're actually considered low value pages by search engines and can really take a toll on your website's rankings, not in a good way.

My advice to you is to block this /whats-new/ directory at all costs in your robots.txt file. This is critical. If you've got any questions, please ask.
 
KodyWallice

KodyWallice

Member
Joined
May 7, 2021
Messages
123
Reaction Score
1
Points
23
  • #2

Tons of Noindexed “What’s New” Pages in XenForo Software​

I think I have unearthed another piece of the SEO puzzle in XenForo forum software. I was browsing Google Webmaster Tools and I clicked into the Coverage area. I clicked the Excluded tab up top and then I clicked the Excluded by Noindex Tag link in the description below. I noticed that my site had thousands of “What’s New” pages that had been crawled, but since they contained the noindex tag, they weren’t put into Google’s index. Now, I’ll tell you from my over 20 years of experience working on the internet and in SEO that having all these noindex pages is not good. Just because they say noindex on them, they still accumulate and bleed pagerank all over the place. I’ve battled with this type of thing for ages.

It seems as though the ID for the page is the only thing that changes. I’m not sure if they’re based on sessions or what. All the pages are actually duplicates of one another. For instance, this is what two sample URLs would look like.

https://www.mysite.com/forum/whats-new/posts/20770/ https://www.mysite.com/forum/whats-new/posts/20771/

…and so on.

For a while, I’ve wondered where all these pages were coming from. Where they were linked to from. Then, I clicked on the small link the “Latest Posts” widget that I have on the homepage. Actually, I had this widget showing on almost all pages until a few days ago. Even though the link to the What’s New page has a nofollow attribute in it, Google is still following all these links that randomly change 24 hours a day, creating all these duplicate pages. By the way, I’m up over 17,000 of these pages in the Google Webmaster Console now. It’s getting out of hand.

I believe I have found a solution to this issue. Since the link on the homepage links to:

https://www.mysite.com/forum/whats-new/posts/?skip=1

and then redirects to one of those other styled URLs that I displayed above, all I did was block the /whats-new/ directory in the robots.txt file. I wouldn’t have done this if there were thousands of different links to all these various pages, but since the source is just through one link, I think this is okay. It’ll take months for Google to see that these are now blocked and to drop the reference to them from their index, but I think things will be okay after that. This directory is even blocked on the XenForo site itself.

Have you noticed something like this happening on your site? Please let me know. What did you do about it?
 
Top