Google NGram viewer(GNV) is an application that allows the intellectual enthusiast to find out the popularity of a particular word(s) from 1500 to the year 2000. This video gives a neat example of how to make best use of its functions.
GNV has said to revolutionize the ways in which we conduct historical research. It is considered to be part of this new academic movement called Cultronomics, which is basically the practice of analyzing large social datasets. For Google, “analyzing the growth, change, and decline of published words over the centuries,” allows the researcher to look at how cultural trends developed over time. Google employs this methodology by using its own database of books, pamphlets and written material from across the centuries. Using a database is more than appropriate in this case, as it saves the historian from having to carry out complex technical queries and allows for a quick linkage of data.
However GNV does encounter some technical problems. It uploads many of its books and pamphlets through Optical Character Recognition, a system that attempts to scan typewritten/printed text into computer-readable text. Although it claims to have 98 percent accuracy, this is measured mainly down to successful scanning of a character and not a whole word. For example an OCR scan of “I ihall no. be able to sufil” instead of ‘I shall not be able to fulfil’, will still yield an accuracy of over 95 percent even if the significant words are incorrectly spelt because most of the letters have been scanned successfully, suggesting that any searches through GNV may lead to misleading results.
OCR also raises some historical problems, the formation of the letter ‘f’ was written in the shape of an ‘s’ up until the late eighteenth century, it is unlikely that the OCR system would find it hard to differentiate between words that actually begin with the letter ‘s’. Interestingly this should make us question the nature of the computer to provide us with relevant information and should remind us that the computer works systematically according to the information it is given, it is hard to make it operate like the human mind that can accommodate for differing historical contexts. Binder actually critiques Google’s Metadata(specification of data structures), she claims that because cataloging and organization is so vital to having a correct dataset and to digitizing history this undermines Google as a trustworthy as a scholarly source.
However this decision to use Optical Character Recognition is part of a larger problem in approaching the data we use. Hitchcock has discussed the evolution of close and distant reading, as we move away from looking closely at each piece of primary source and instead trying to obtain a vast amount of information from looking at sources collectively. GNV aims to exercise the latter, which in this case is good and bad for academic research. It is an appropriate use of methodology as it emphasizes the accessibility and the searchability of digital researching in comparison to manually flicking the pages desperately looking for relevant information.
But, we must take the information we visualize with a pinch of salt, as we can only investigate the words’ meaning by searching by Google books. E.g, the popularity of the word ‘Antichrist’ can be easily acknowledged in the 1640s as the Ngram shows a huge increase. Although without a proper understanding of why the word is popular and the actual meaning of the word, the results only appear to give us a superficial impression of its popularity without its historical significance. Secondly, we must remember that whatever context you can obtain from a ‘close reading’ of Google books, it must be remembered that these books are a limited collection organized by Google. Google’s collection has gone through a selective process, thus whatever ‘topic modelling‘ is constructed by GNV, this is only subject to Google’s selection of books and NOT every single book that has been published.
Despite the limitations in the Google books collection, there is great flexibility in GNV. Many scholars have welcomed the flexibility of the GNV mainly because you can manipulate the data. Karch has explicitly recognized the fact that you can search for verb and noun variations of different words, whilst also changing the time-frames as well as comparing trends amongst different words. This is vitally important, because by giving the researcher the tools of direction, it helps to create greater creativity and more original studies. Plus, because this creativity is visualized in chart form, it makes researching more appealing and engaging to both scholarly and non-scholarly audiences. Unfortunately the ability to use Boolean searches(e.g “AND”) does hinder how precise and creative you can actually be, however incorporating this in such a large data set would probably have proven difficult.
Lastly, moving away from technical approaches, GNV has raised some theoretical issues to academic research. Firstly, Ngram Viewer has encouraged greater interdisciplinary analysis as the study of semantics, computing and history can build upon each other to create new areas of research.
However there is a danger according to Hitchcock, that these new programs may create a historical positivism. In that fusing socio-sciences and the humanities will make Historians believe that their research must be fit into neat conclusions. It must be ensured that Cultronomical projects like GNV promote the process of ‘interpretation’ instead of just searching for pure facts.
Fancy some extra reading? | Bibliography of Links
- ‘What is Cultronomics? ‘http://www.cresc.ac.uk/news/news-from-cresc/what-is-cultronomics’
- Google opens books to new cultural studies ‘http://www.sciencemag.org/content/330/6011/1600.summary’
- Improving OCR accuracy ‘http://www.dlib.org/dlib/march09/holley/03holley.html’
- Google Books: OCR and Metadata ‘http://web.resourceshelf.com/go/resourceblog/62743’
- Big Data for Dead People ‘http://historyonics.blogspot.co.uk/2013/12/big-data-for-dead-people-digital.html’
- Topic Modeling ‘http://mallet.cs.umass.edu/topics.php’
- Karch’s analysis of Google N’Grams Viewer ‘http://google.about.com/od/n/a/Google-Books-Ngram-Viewer.htm’
- Google Books ‘http://books.google.com/’
- Google Ngram Viewer ‘https://books.google.com/ngrams/info’