Poetry and Big Data Sets

“A poem is worth a thousand images.” – Anonymous

Information technology has created a deluge of data; the challenge is how to understand it, how to extract meaning from megabytes. Many techniques are being developed. Since, poets have distilled complex experiences for millenia, we are curious if poetic license could be applied to transform an enormous textual dataset into a concise series of impressions that reflect the original meaning.

As an experiment, we took WikiLeaks initial public release of 90,000 redacted memos related to the US war in Afghanistan between January 2004 and December 2009. We used standard tools to create a histogram of the most frequent words, found the 400 most common words, and then wrote a poem paying attention to word structure and rhythm. We are not professional poets by any stretch. We are currently experimenting with different means of presenting the verse online in collaboration with musicians and interactive tools that would allow others to construct their own poems from large textual datasets.

Collaborators:
Ken Goldberg, Kris Fallon, Siamak Faridani