E-Commerce Challenges: Indexation Issues

One of the many challenges of working on e-commerce sites is simply dealing with the sheer number of pages that exist for these sites. In addition, most of the recommendations need to be slightly automated and definitely scalable, as changing each individual product page is incredibly time-consuming and not efficient. Below are some of the common issues that I've encountered when dealing with indexation issues on e-commerce sites and some of my recommended solutions.

Similar Product Pages

Many of the pages often contain duplicate content, whether it be from duplicate titles and/or duplicate content on the site for very similar product pages. For example, these two products that were sold on Amazon are incredibly similar and almost identical (over 95% identical).

Duplicate Descriptions

Also, the technical details of the two products are identical.

Duplicate ProductsThis is not ideal as this could easily be viewed as duplicate content by the search engines. At the same time, the issue occurs because ultimately, these two products are the same, except that one of the products include more tabs than the other.


1) Product Reviews:

Amazon could decide to write unique title and product descriptions for each of their "duplicate" product pages, but they don't. They've worked around this issue by using user-generated content in the form of product reviews. The fact that one product has 4 product reviews and the other has 11 product reviews means that the majority of the content on the site is unique. Thus, search engines can tell that these are actually not duplicate pages.

Product Review

Encourage your users to leave product reviews either via social media or send email reminders after your users have received the product and have had the opportunity to test it out.

However, product reviews are not an instant solution. It takes time to build up the number of product reviews on a site. (In general, people only tend to leave product reviews after they have seen other individuals leave product reviews, so a critical mass needs to be built first.)

In the meantime, consider building a...

2) Score Sheet

The score sheet requires pages on the site to receive a certain threshold before they will be indexed. All other pages will be noindexed (via meta robots). The script for this internal sheet should be consistently re-run (at least once a month) to make sure that all indexed pages maintain their quality, while also allowing the opportunity for low-quality pages to become high-quality and be indexed. The criteria for determining which pages are high-quality could vary. Some suggestions could include:

  • For duplicate pages (such as those that have been pinpointed by Google Webmaster Tools), allow only one of those pages to be indexed. Ideally, it should be the page that has received the most amount of traffic from a given time period (such as the past 60 days).
  • If none of the pages had received any traffic from the past 60 days, choose the page that has the most high-quality static content (such as from product reviews).
The score sheet could be used to also generate the XML site maps for the site.

If this type of script is not possible, consider writing a less ideal, but simpler script. Noindex any pages that haven't received any traffic over a given time period (such as 60 days). Hopefully, this will automatically include any duplicate pages on your site and even if it noindexes non-duplicate content, the consequences are limited. After all, these are pages that haven't brought any traffic to your site over the past two months.

This also ensures that only the pages that deserve to rank for your targeted keywords will rank. This article from SEOMoz shows CTR is a ranking factor for Bing (and possibly for Google)

Slingshot SEOImage courtesy of: http://www.slingshotseo.com/ from their SEOmoz blog post (link here)

Out-of-Stock or Expired Products

For an improved user experience, make sure that any sold-out product is clearly communicated to the users before they try to purchase the product. This helps manage user expectation and saves them time/effort.

You could even consider broadcasting your low on stock products. Amazon does a tremendous job of doing this.

Out of StockSolutions:

302 Redirect for Out-of-Stock Products

For out-of-stock products consider 302 redirecting these pages to a landing page that states something along the lines of "We're sorry, but this product is currently out-of-stock. Please take a look at our other similar products." This will 1) let search engines know that this is temporary and 2) could limit the amount of traffic that goes these pages, as having people land on out-of-stock product pages would generally not result in a positive user experience.

301 Redirect Expired Products

Also be sure to 301 redirect any expired products (essentially products that will no longer be carried by the site) to a landing page that states something along the lines of "We're sorry, but this product is no longer available. Here are some similar products that may interest you."

Meta Robots Noindex/Nofollow vs. Robots.txt

SEOMoz actually has a tremendous guide on meta robots vs. robots.txt.

When to Use Robots.txt: For any logged-in or administrative pages, you should implement a robots.txt to block the entire directory rather than block each individual page. This would apply to any pages where you want to block all the pages within that folder.

 Example: user-agent: *

               Disallow: /admin/

               Disallow: /account/

When to Use Meta Robots Noindex/Nofollow: Use for any pages that you might want to have indexed or crawled further down the line. Meta robots noindex/nofollow are easier to take back than robots.txt, which are considered more permanent.

I would also recommend using meta robots noindex/nofollow for any pages that have the possibility of garnering links. Google will often index pages that have been linked to, even if those specific pages are blocked by robots.txt.


Most of the time, our e-commerce clients already know which pages they want indexed or noindexed. Their main concerns are how to keep track of these indexation issues and building scalable solutions to maintain them. The purpose was to show three very common issues that consistently show up on e-commerce sites: similar product pages, dealing with expired and out-of-stock products, and knowing when to use robots.txt and meta robots noindex/nofollow, as well as to show my recommendations on how to fix them. 


Get blog posts via email