Reading at a Distance: World War I Word Clouds

WWI soldiers in trenches

For the 100-year anniversary of the end of World War I, I decided to write a blog post on poetry written during the war. As I sorted through various war poets such as Rupert Brooke, Siegfried Sassoon, and Edgell Rickword, a compilation on the Poetry Foundation website stuck out to me.

It wasn’t so much the poems themselves that caught my eye, but the preface of the compilations, which critiqued the differences in poems written at the start of the war and poems written at the end of the war:

You may notice that more poems in 1914 and 1915 extoll the old virtues of honor, duty, heroism, and glory, while many later poems after 1915 approach these lofty abstractions with far greater skepticism and moral subtlety, through realism and bitter irony. Though horrific depictions of battle in poetry date back to Homer’s Iliad, the later poems of WWI mark a substantial shift in how we view war and sacrifice.

According to the Poetry Foundation‘s editors, the poems had dramatically different tones and language depending on the year they were written. Despite having the same topic – World War I – differences could be seen not just by looking from author to author, but by looking at compilations of poems written just a few years apart.

This made me think about the nature of poetry itself; generally, poems use fewer words than prose, making each word “heavier” with meaning. I wondered this – by looking at the words of a poem without the poetic structure (lines, grammar, punctuation), would it be possible to see the same patterns of “virtue” and “abstraction” that the Poetry Foundation spoke of in their preface?

That question is what spurred me to create word clouds using the web application Wordsift. Word clouds count the occurrence of individual words in a given body of text, then visualize those words together in what looks like a cloud. Words that appear more often are “weighted”, by appearing darker or larger than other words in the cloud.

My hypothesis was this: if words alone can reflect the tone of a compilation of poems, then that should be reflected in the word cloud, particularly if the poems are all centered around a particular topic. Since the poems were about World War I specifically, I figured that there would be enough repeated words to show a pattern in the word clouds, especially since the compilations were categorized by major WWI events.

In some ways, the visualizations are successful in showing the shift in tone from the “1914” poems to the “1919 and beyond” poems. The clouds, however, also reflect the flaws in this particular type of data visualization.

Below are the clouds generated by pasting all the poems from the 1914 compilation together, and all the poems from the 1919 and beyond compilation together.

The 1914 compilation includes poems such as “Channel Firing” by Thomas Hardy, “The Dead” by Rupert Brooke, and “To Germany” by Charles Sorley.


1919 up.png
The 1919 and beyond compilation includes poems such as “Everyone Sang” by Siegfried Sassoon, “Trench Poets” by Edgell Rickword, and “First Time In” by Ivor Gurney.

There is a fairly significant change in weighted words depending on the year. In the 1914 compilation, for example, there is a wide spread of high-usage words. Many of these words, however, are not the “abstractions” that the compilation’s preface speaks of. Rather, they are words that may be used to describe or allude to abstraction.

Heart, for example, is probably used in the context of love – something intangible. But while the word itself is concrete (a heart is the organ in our chests), it can be used to describe love (an abstraction).

The same can be said for the words “day” and “night”. Day and night can reference something very real, a change in the sky that we can see with our eyes. In addition to this, though, the morning daylight is often used to connote beauty and new beginnings, and night is often used metaphorically to refer to feelings of apprehension, or an end to something. So, while the cloud may not reveal the abstractions themselves, it does reveal the ways in which the writer may describe these abstractions.

In comparison, there is tonal shift in the “1919 and beyond” from the “1914” poems, although it is not as obvious as when one reads the poems in their entirety.

Rupert Brooke, a poet included in the 1914 compilation. Brooke died at the age of 27 on his way to fight in the Battle of Gallipoli.

“Man” and “thing”, for example, are by far the most common words. While “man” is also common in the 1914 poetry, it is used much more often in the 1919 and beyond compilation. The high usage of the word “man” in addition to  the word “thing” can evidence the move from “lofty abstractions”. There is nothing metaphorical about a thing – it just is. It has no metaphorical substance to aid an abstraction, no tone that ties it to anything.

“Thing” with the word “man” may also reflect the depersonalization and disconnectedness that was fostered in the way World War I was fought. World War I was the first fully mechanized war, pitting man not just against other men, but against powerful weaponry that could cause massive casualties from afar by simply pulling a trigger. In this way, the men who died were just “things” to their enemies, who could kill hundreds of soldiers without having to see the whites of their eyes.

All in all, the word clouds certainly visualized something, but that something needed to be heavily supplemented with close reading skills, which are methods of looking intertextually at literature. This word cloud method of visualization is somewhere in between reading a traditional piece of literature and looking at a graph; while the poems were stripped down into their bare bones, the context of the poems was still needed to understand the data.

But what about a form of looking at literature that was purely quantitative? My own analysis of the World War I word clouds touched on that, but didn’t quite bring it to the place it needed to be to be truly distant. That type of reading, however, is not only possible, but being used  by literary critics such as Franco Moretti, and within Plymouth State University itself.

To learn more about distant reading, and how it’s being used in our very own Studies in English courses, stay tuned for the next installment of “Reading at a Distance” on The Canon.