psst .. want some cheap n-grams?

I occasionally post on text-mining issues, and those who work with text maybe know already about the ngram database from Google.

What are you going to do with n-grams?. Well, they can be useful in keyphrase research for website optimization, but there is a sensible suggestion here “Scrape Something Else” to the effect that simple minded use of other people’s keyphrases might be a poisoned chalice..

For extracting semantic content for higher level processing eg for detecting media bias, ngrams can be useful.

And lots of other uses. But beware .. you will need your system set up for mining terabytes on the desktop .. the corpus comes on 6 DVD’s

Leave a Comment