Statistically Improbable Words

A new feature that displays the top 5 Statistically Improbable Keywords for each article has been added. These are words that repeat themselves over and over again within an article minus common meaningless words (we call these “stop words”).

You can find this feature by going to any article and clicking on the EZINEPUBLISHER link in the upper right corner of the article tools.

Random Example: Take article: Are Liquid Vitamins Enough?, and click on the EzinePublisher link in the upper right corner of the article. We then deliver you this information:

Statistically Improbable Keywords In This Article:
foods – 3%
acrylamide – 2%
high – 2%
diet – 1%
frying – 1%

That means the word “foods” shows up 3% of the time in this article, “acrylamide” shows up 2% of the time and so forth. You could call our Statistically Improbable Keywords feature the same as evaluating what the keyword density is of an article.

My question to the world: What good is this information? How else should we integrate this meta-data (data about data) information for the benefit of our authors or publishers?


Hans writes:

The next thing to do would be to extent this to word-combinations. wouldn’t it?

Comment provided December 13, 2005 at 5:28 PM


James Wilson writes:

This feature would be revealing when applied against mass media articles. I believe this is what was wrong with search engines before Google came along. Google’s strength is in the ability to gage or extrapolate relevance of an article from its overall content, not just the number of times a single word appears. If you look at medical journals regarding cancer, for example, the word cancer may not appear more than once or twice in a very long article, but various ‘laterally-related’ words – i.e., what is being referred to here as ‘statistically improbable’ words – would appear frequently. A lesser search engine would make those frequently reoccuring words the ‘stars’ of the article, and therefore the article originally written about cancer would not rank high.

I believe the most useful feature would be one that suggests words that might be added, as opposed to the ones that shouldn’t be there.

Articles we publish in Vegas Buzz seem to have the added edge of appearing in a relevant category specific news feed(s), that somehow strengthens the relevance of the articles in the newsfeeds.

Just a few thoughts.

Jim Wilson
Vegas Buzz

Comment provided December 13, 2005 at 6:13 PM


Tinu writes:

Hey Chris,

This is a really interesting feature.

If there was a way for article writers to know how keyword dense their articles were during the submissions process, and what threshold to stay below in order to make their articles more useful to readers, it would be a great way to help authors build stronger relationships with the documents they’re publishing on remote sites and their own.

I will think it over and come back with more comments.

I swear… every time I think you couldn’t possibly make this site more useful (by exhausting all the current options!) you come up with ideas like this…

Thanks Chris!


Comment provided December 13, 2005 at 6:37 PM


Debbie writes:

For those of us who are newer to web marketing and still learning the terminology and meaning, it may help to elaborate on what you mean by “statistically improbable” so we know how to use this information. For example, does the 3% for the word “foods” in your example indicate that keyword isn’t in the article enough for it to be a valuable keyword? Or does it mean the article is over-saturated with that keyword at 3%? Also, an indication of what a good keyword density percentage would be for an article would be helpful.

BTW, thanks for continuing to add features and functionality to help us use our articles to more effectively market our online businesses and drive traffic to our web sites!

Comment provided December 13, 2005 at 7:40 PM


Dina writes:

Is it a rumor, or is Google’s new algorithm making keyword density obsolete anyway? Just the other day I was reading that the PLACEMENT of the keywords and links on your page is far more significant as a way to keep the spiders crawling than the number of times the word appears? If so… why worry about this at all?

(Can you tell I’m a little gleeful? Content Au Naturale is my cup of tea).


Comment provided December 13, 2005 at 10:19 PM


Richard writes:

I think this feature is Great!
Two things come to my mind, Search engine relevance and Google Adsense.
Wonderful tool Thanks Chris.


Comment provided December 14, 2005 at 8:46 AM


Sheryl writes:

People searching will not put the density of words used in an article, they will do a search on what the article is about. This feature is good to an extent. Depending on what you want to use it for?

Comment provided December 14, 2005 at 5:05 PM


Brian Baldwin writes:

This was very interesting when I went back and looked at my articles. I think I need to change the way I write.


Comment provided December 15, 2005 at 1:41 PM


Geoff writes:


Great article. As ane experienced search engine marketer, I related very well to what you are saying here.

Another reason why your site is the best.

Continued Success,


Comment provided December 19, 2005 at 10:25 AM


Devasish Gupta writes:

This kind of Useful features is what makes a service Credible, Informative, Effective and profitable.

“Statistically Improbable Keywords” feature will definitely encourage authors to write original, targeted and high-infi-value articles.

This feature enables users to search for the exact articles they need.

The fact that this feature boosts Search Engine Rankings is already known.

Great! Useful!

Keep up the good job,

Thank You,


Comment provided December 21, 2005 at 2:32 AM


RSS feed for comments on this post.

Leave a comment

Please read our comment policy before commenting.