Saturday, December 18, 2010

Data mining past publications

Marian Pierre-Louis shared this interesting link from the Boston Globe via Facebook about a new study published in Science and a new tool Google has made publicly available. The gist:

Google is publicly launching the tool, Google Books Ngram Viewer, to allow scholars or the simply curious to ask questions, such as when references to “The Great War,’’ which peaked between 1915 and 1941, were replaced by “World War I.’’ The tool allows people to look up words or phrases that range from one to five words, and see their occurrences over time — the frequency that a word is mentioned in a given year divided by the total number of words written that year.

I'm sure we can learn a lot from this. And like all tools going back to the sharp stick, it can be misused as well. Counting things is never the whole story. As noted in the article, the way in which words are used may mean more than their frequency. And sometimes the revealing fact lies in what things that are not mentioned, what books were never published, or words whose meaning has subtly shifted over time.

No comments: