Authorial analysis #3: Writers, disguised and translated

August 23, 2013 NLP Language Software

With a genuine, functional authorial analyzer on your hands, could you resist the temptation to put on a fake moustache and do some real detective work? I couldn’t. Unfortunately, Hungarian literature has been notoriously short on mystery authors recently, so I had to go back to 1987, and to an author that has long fessed up. To raise dramatic tension, I then asked a different question. Which characterizes a translated text more: the original author or the translator?

Esterházy: a female author that’s not

For a language that utterly lacks grammatical gender, Hungarian literature has very few female authors. Ironically, even the ones there are repeatedly turn out to be a male big shot putting on women’s clothes for a single literary experiment. Such was the case with Psziché, the collection of a fictive female personality’s letters and poems, published in 1972 by Sándor Weöres. A decade and a half later came Tizenhét hattyúk (Seventeen Swans) from an unknown writer, Lili Csokonai. Curiously, the two works have one other thing in common: they both indulge in a construed archaic language.

The identity of Lili Csokonai did not remain a mystery for long. Before 1987 was out, László Szále published a brilliant analysis unmasking Esterházy as the real author – and did so, unsurprisingly, without involving computer linguistics. Esterházy fessed up; the book continues to be published under the name of Lili Csokonai; and 27 years later, this non-mystery hardly gives readers sleepless nights anymore.

But given the dramatically inventive, 17th-century language and orthography of the book, I was curious to see if authorial analysis would agree. The lineup included novels by four contemporary writers:

Esterházy Krasznahorkai Nádas Závada
Pápai vizeken ne kalózkodj
(1977)
Sátántangó
(1985)
Egy családregény vége
(1977)
Jadviga párnája
(1997)
Ki szavatol a lady biztonságáért
(1982)
Az ellenállás melankóliája
(1989)
Emlékiratok könyve
(1986)
Milota
(2002)
Kis magyar pornográfia
(1984)
Az égi és földi szerelemről
(1991)
A szív segédigéi
(1985)
119,596
words
166,149
words
323,542
words
247,996
words

At 21,735 words, Tizenhét hattyúk is a relatively short novel. The program’s expert committee was undecided:

Metric Authors ordered by similarity
Word length zavada, esterhazy, nadas, krasznahorkai
Most frequent words nadas, esterhazy, krasznahorkai, zavada
4-grams esterhazy, krasznahorkai, nadas, zavada
Word-final trigrams nadas, esterhazy, zavada, krasznahorkai

This translates to a score of 2.5 for both Nádas and Esterházy. Curiously, Esterházy came in first based on only one metric, character 4-grams; but his second-place rankings with all other metrics added up to bring the author’s score on par with Nádas, who ranked first according to two different metrics. I see Esterházy’s consistent second places behind two different authors as proof of the idiosyncratic nature of this particular text. In light of that, the tool’s precision is actually beyond my expectations.

Authors not lost in translation

In the next experiment I wanted to see whether the original author or the translator had the bigger influence on a text. For this I looked for three books from two authors, translated by two different translators:

Book Original author Translator
1 A T
2 B U
X A U

If I train the authorial analyzer with books 1 and 2, then have it check book X, will X come out more similar to 1 (same author, but translated by a different person), or to 2 (different author, but translated by the same person)?

The first three books I took were:

Book Original author Translator
Pride and Prejudice Jane Austen Miklós Szenczi
The Sirens of Titan Kurt Vonnegut Mária Borbás
Sense and Sensibility Jane Austen Mária Borbás

The jury clearly thought the translation of Sense and Sensibility was closest to the translation of Pride and Prejudice, although the translators are different:

Metric Authors-translators ordered by similarity
Word length vonnegut-borbas, austen-szenczi
Most frequent words austen-szenczi, vonnegut-borbas
4-grams austen-szenczi, vonnegut-borbas
Word-final trigrams austen-szenczi, vonnegut-borbas

Clearly the texts of Austen are light years (and nearly two centuries) away from Vonnegut; it’s no surprise that their texts are dissimilar, even in translation. So I went on to check a second set, pitching Vonnegut against Douglas Adams:

Book Original author Translator
The Restaurant at the End of the Universe Douglas Adams Sándor Nagy
Hocus Pocus Kurt Vonnegut István Molnár
The Hitchhiker's Guide to the Galaxy Douglas Adams István Molnár

The jury’s verdict was clear. The translation of The Hitchhiker's Guide to the Galaxy is closest to the translation of The Restaurant at the End of the Universe, although the translators are different:

Metric Authors-translators ordered by similarity
Word length douglas-nagy, vonnegut-molnar
Most frequent words douglas-nagy, vonnegut-molnar
4-grams douglas-nagy, vonnegut-molnar
Word-final trigrams douglas-nagy, vonnegut-molnar

These are only two experiments; I would definitely not consider them as much more than anecdotal evidence. But I am still surprised that the original authors seem to have a greater influence on the similarity of translations than the translators producing the actual sentences. Good literary translators apparently have the ability to become invisible and write in a style that best conveys the original work.

Your voice texts

Do you have an authorship mystery of your own to solve? Do you have your own triple of books to put translators through their paces? I’m interested to hear.