In one of the more recent iterations of the brand of digital humanities known as computational text analysis, Stephen Ramsay makes his case for what he terms “algorithmic criticism.” In Reading Machines: Toward an Algorithmic Criticism, Ramsay justifies his methodology by claiming it is simply a more thorough version of the methodology to which literary critics already subscribe just at “a different scale and with expanded powers of observation” (17).
Essentially, algorithmic criticism is the use of computer software to identify patterns within a text such as sentence length, word frequency, dark/light imagery, etc. In opposition to the serendipitous nature in which traditional hermeneutic criticism locates these patterns, Ramsay argues that computers “can unerringly discover every instance of such features across a massive corpus of literary texts and then present those features in a visual format” (17).
For the most part, Ramsay is balanced in his theoretical application of computational text analysis, noting that computer software is useless as a means for delivering the “correct” interpretation of a literary work, as this interpretation is irrelevant to the aims of literary criticism, which, as Ramsay quips, is aimed not at resolving literary disputes but to ensure “that the matter might become richer, deeper, and ever more complicated” (16).
In his second chapter, Ramsay takes a formula common to computational text analysis (term frequency) and applies it to Virginia Woolf’s modernist novel The Waves as a way of generating data for determining variations among the characters’ dialogue. His formula for this procedure is:
tf – idf = tf * (N/df)
In this formula, N corresponds to the number of documents and df corresponds to the number of documents in which a particular term appears. This formula helps to offset the amount of weight given to words that are likely to appear as the most common in all of the documents such as articles and prepositions. For example, if “on” appears 87 times in a single document but also appear in three of the other six documents, then the term frequency (87) is multiplied by two, or six (N) divided by three (df).
In Ramsay’s own example, he generates the twenty three most frequently used terms for each of the six major characters in Woolf’s novel (see below).
Bernard
|
Louis
|
Neville
|
Jinny
|
Rhoda
|
Susan
|
thinks
|
mr
|
catullus
|
tunnel
|
oblong
|
setter
|
Letter
|
western
|
doomed
|
prepared
|
dips
|
washing
|
curiosity
|
nile
|
immitigable
|
melancholy
|
bunch
|
apron
|
moffat
|
Australian
|
papers
|
billowing
|
fuller
|
pear
|
final
|
beast
|
bookcase
|
fiery
|
moonlight
|
seasons
|
important
|
grained
|
bored
|
game
|
party
|
squirrel
|
low
|
thou
|
camel
|
native
|
them-
|
window-pane
|
simple
|
wilt
|
detect
|
peers
|
allowed
|
kitchen
|
canopy
|
pitchers
|
expose
|
quicker
|
cliffs
|
baby
|
getting
|
steel
|
hubbub
|
victory
|
empress
|
betty
|
hoot
|
attempt
|
incredible
|
band
|
fleet
|
bitten
|
hums
|
average
|
lack
|
banners
|
garland
|
boil
|
rabbit
|
clerks
|
loads
|
cabinet
|
immune
|
cabbages
|
tick
|
disorder
|
mallet
|
coach
|
many-backed
|
carbolic
|
tooth
|
accent
|
marvel
|
crag
|
minnows
|
clara
|
arrive
|
beaten
|
shoots
|
dazzle
|
pond
|
cow
|
bandaged
|
bobbing
|
squirting
|
deftly
|
structure
|
cradle
|
bowled
|
custard
|
waits
|
equipped
|
wonder
|
eggs
|
brushed
|
discord
|
stair
|
eyebrows
|
tiger
|
ernest
|
buzzing
|
Eating-shop
|
abject
|
felled
|
swallow
|
hams
|
complex
|
england
|
admirable
|
frightened
|
africa
|
hare
|
concrete
|
eyres
|
ajax
|
gaze
|
amorous
|
lettuce
|
deeply
|
Four-thirty
|
aloud
|
jump
|
attitude
|
locked
|
detachment
|
ham
|
bath
|
lockets
|
bow
|
maids
|
Although I was initially disappointed in seeing the dynamic complexity of The Waves reduced to a series of data points, Ramsay’s use of the list as a tool for interpreting the novel displayed a more tempered hermeneutic than I had anticipated. He mentions that because the word “accent” appears in Louis’ column and no others might point to the possibility that he is more self conscious of his speech than the other characters. Also, the evocative imagery of Jinny’s list (“fiery,” “dazzle,” “billowing,” “gaze”) could be indicative of romantic/sexual undertones within her narrative.
Ramsay’s parenthetical asides in this section seem to belie the fact that he is highly familiar with the novel itself, and, in many ways, the list merely seems to reinforce interpretations that are more effectively justified at the narrative level. Indeed, Ramsay is quick to cite this kind of textual refocusing as one of the main advantages of algorithmic criticism, writing that a term frequency list allows us to return to a text “with our focus narrowed and reframed” (12).
The important thing to keep in mind for algorithmic text analysis is that the computer output is only relevant insofar as it remains within the boundaries of a possible human reading. Although this might seem heretically anthropocentric in the age of posthumanism, I nonetheless hold firm that a statistical analysis of The Catcher in the Rye is completely irrelevant if it does not correspond to a reading that would be significant to a human. Utilizing software to produce an entirely non-human reading of a text is merely to explain how a text signifies to a non-human.
However, there is an alternate to this human/non-human reading binary. The only logical justification for utilizing computer software to read a text is precisely because it is capable of reading in a manner that is beyond (or at least more efficient than) a standard human reading.
This is why, for me, the most valuable aspect of Ramsay’s algorithmic criticism is the way it could be utilized to illuminate the affective grounds of individuated readings of a text. In Ramsay’s estimation, interpretive approaches often begin with some kind of affective hypothesis anyway (noticing frequency of an image, an uncanny or eerie tone, etc.) and therefore an algorithmic analysis of the text would merely supply the reader with possible explanations for how this affect was initiated on a material level (15). This level of textual materiality, or “low-level linguistic phenomena” (word patterns, syntax, etc.), signifies to both human and machine (8). The key for algorithmic criticism is striking the balance between a reading that a machine is uniquely capable of producing yet still operates within the phenomenological landscape of human reading. Otherwise, the data output is either redundant (a human could have produced the reading) or irrelevant (only a computer could have produced the reading).
