Enhancing Site Integrity

Over the past month we’ve been working on pro-active measures to enhance site integrity so as not to be penalized by the many dozens of search engines that index our site daily.

One of them is our new advanced article de-duper designed to not only identify EXACT-MATCH articles, but now we are able to identify “LIKE-MATCH” articles.

Honestly, I was shocked when I printed out the 10 page report of the 3 dozen “like match” articles that we had in the directory. I suppose as a percentage of 33k articles, that is not bad, but still it’s not acceptable to have duplicate title or duplicate content articles in our site.

Here’s the methodology that we used to determine which one of the exact match or like match article to remove:

First, if an author had two different author names with the same set of articles, I would merge them into one author name (whichever looked either more professional or how the author had their name laid out in the resource box of their articles).

Then, I’d look for number of page views. Whichever of the duplicate articles had more page views than the other…the one with the least page views got dumped.

In a few situations, an author clearly did not submit original works and they had all of their articles removed.

The really tough part of this de-dupe job is that we have 5-7 pages of article titles that are identical but the bodies of the articles are not even remotely identical. With those, we sift through by hand to identify potential dupes.

The one ‘dead giveway’ that an article is a dupe before I even look at it: If the word count is identical or within 10 words of the other dupe article that was identified.

Today, we have better systems in place to deny authors from sending in articles that match previous articles they have submitted in the past…and I suppose this event is just clean up from the past from not having duplicate article title checking. :-)

The whole point of this exercise is to ensure that we don’t have or allow duplicate articles in our directory. You might want to do the same for your website so that you’re never accused of search engine spamming. It’s about site integrity.

3 Comments »


1
Kaye Bailey writes:

Impressive.

Chris, I’ve been wondering, is it possible to include the same article in two categories? Sometimes I just fret myself silly trying to decide which category to pigeon-hole my article.

Thanks,
Kaye

Comment provided May 19, 2005 at 10:13 AM

[Reply]

2
Chris Knight writes:

Kaye,

Sorry, no.

If we ever offer this capability, it will be on a fee-basis only. We’ve considered it but there was just not enough reason to make it possible yet PLUS, I’m not certain it means that much more traffic will actually come to your article considering that most people don’t browse our site but rather land on an article from a search engine hand off, RSS hand off or email alert hand off.

-Chris

Comment provided May 19, 2005 at 10:18 AM

[Reply]

3
Kaye Bailey writes:

Thanks Chris for an understandable and quick response. However, I shall now hold you personally responsible for my fret wrinkles! :)

Seriously, EzineArticles has been fabulous for sending traffic my way, thank you so much!

Kaye

Comment provided May 19, 2005 at 10:55 AM

[Reply]

RSS feed for comments on this post.

Leave a comment

Please read our comment policy before commenting.