Authorial analysis #3: Writers, disguised and translated
With a genuine, functional authorial analyzer on your hands, could you resist the temptation to put on a fake moustache and do some real detective work? I couldn’t. Unfortunately, Hungarian literature has been notoriously short on mystery authors recently, so I had to go back to 1987, and to an author that has long fessed up. To raise dramatic tension, I then asked a different question. Which characterizes a translated text more: the original author or the translator?
Esterházy: a female author that’s not
For a language that utterly lacks grammatical gender, Hungarian literature has very few female authors. Ironically, even the ones there are repeatedly turn out to be a male big shot putting on women’s clothes for a single literary experiment. Such was the case with Psziché, the collection of a fictive female personality’s letters and poems, published in 1972 by Sándor Weöres. A decade and a half later came Tizenhét hattyúk (Seventeen Swans) from an unknown writer, Lili Csokonai. Curiously, the two works have one other thing in common: they both indulge in a construed archaic language.
The identity of Lili Csokonai did not remain a mystery for long. Before 1987 was out, László Szále published a brilliant analysis unmasking Esterházy as the real author – and did so, unsurprisingly, without involving computer linguistics. Esterházy fessed up; the book continues to be published under the name of Lili Csokonai; and 27 years later, this non-mystery hardly gives readers sleepless nights anymore.
But given the dramatically inventive, 17th-century language and orthography of the book, I was curious to see if authorial analysis would agree. The lineup included novels by four contemporary writers:
|Pápai vizeken ne kalózkodj
|Egy családregény vége
|Ki szavatol a lady biztonságáért
|Az ellenállás melankóliája
|Kis magyar pornográfia
|Az égi és földi szerelemről
|A szív segédigéi
At 21,735 words, Tizenhét hattyúk is a relatively short novel. The program’s expert committee was undecided:
|Metric||Authors ordered by similarity|
|Word length||zavada, esterhazy, nadas, krasznahorkai|
|Most frequent words||nadas, esterhazy, krasznahorkai, zavada|
|4-grams||esterhazy, krasznahorkai, nadas, zavada|
|Word-final trigrams||nadas, esterhazy, zavada, krasznahorkai|
This translates to a score of 2.5 for both Nádas and Esterházy. Curiously, Esterházy came in first based on only one metric, character 4-grams; but his second-place rankings with all other metrics added up to bring the author’s score on par with Nádas, who ranked first according to two different metrics. I see Esterházy’s consistent second places behind two different authors as proof of the idiosyncratic nature of this particular text. In light of that, the tool’s precision is actually beyond my expectations.
Authors not lost in translation
In the next experiment I wanted to see whether the original author or the translator had the bigger influence on a text. For this I looked for three books from two authors, translated by two different translators:
If I train the authorial analyzer with books 1 and 2, then have it check book X, will X come out more similar to 1 (same author, but translated by a different person), or to 2 (different author, but translated by the same person)?
The first three books I took were:
|Pride and Prejudice||Jane Austen||Miklós Szenczi|
|The Sirens of Titan||Kurt Vonnegut||Mária Borbás|
|Sense and Sensibility||Jane Austen||Mária Borbás|
The jury clearly thought the translation of Sense and Sensibility was closest to the translation of Pride and Prejudice, although the translators are different:
|Metric||Authors-translators ordered by similarity|
|Word length||vonnegut-borbas, austen-szenczi|
|Most frequent words||austen-szenczi, vonnegut-borbas|
|Word-final trigrams||austen-szenczi, vonnegut-borbas|
Clearly the texts of Austen are light years (and nearly two centuries) away from Vonnegut; it’s no surprise that their texts are dissimilar, even in translation. So I went on to check a second set, pitching Vonnegut against Douglas Adams:
|The Restaurant at the End of the Universe||Douglas Adams||Sándor Nagy|
|Hocus Pocus||Kurt Vonnegut||István Molnár|
|The Hitchhiker's Guide to the Galaxy||Douglas Adams||István Molnár|
The jury’s verdict was clear. The translation of The Hitchhiker's Guide to the Galaxy is closest to the translation of The Restaurant at the End of the Universe, although the translators are different:
|Metric||Authors-translators ordered by similarity|
|Word length||douglas-nagy, vonnegut-molnar|
|Most frequent words||douglas-nagy, vonnegut-molnar|
|Word-final trigrams||douglas-nagy, vonnegut-molnar|
These are only two experiments; I would definitely not consider them as much more than anecdotal evidence. But I am still surprised that the original authors seem to have a greater influence on the similarity of translations than the translators producing the actual sentences. Good literary translators apparently have the ability to become invisible and write in a style that best conveys the original work.
Do you have an authorship mystery of your own to solve? Do you have your own triple of books to put translators through their paces? I’m interested to hear.