jealous markup
Software and language, mostly

Word’s Editor: prejudice disguised as technology. A rant about language peeving codified in software.

March 31, 2017 Language Software Stuff

This is a rant. I love rants for their own sake, but I’m actually genuinely pissed off here, with a pinch of scared thrown in.

I saw trouble brewing a few weeks back when an article entitled Machine learning in Microsoft Word’s new Editor gave me the frights started surfacing and re-surfacing in my Twitter feed. No, I won’t be linking to that text; if you’re interested, go ask Google. It’s a happy-go-lucky publicity piece by a self-identifying professional writer. It is touting the benefits of MS Word’s new Editor feature, which will allegedly help bad writers write good, to paraphrase another article I’ll leave unlinked. Editor claims to be a super advanced new feature that will improve your prose. These are the things it offers:

.

What’s wrong with that? In short: almost everything. Here’s why.

Language peeving as a national sport

Looking at that list of options, my first thought was, it could be a lot worse. My second thought was, it is bad enough. I mean, c’mon. Passive Voice? And then, Passive Voice with Unknown Actor? Geeez. We’re back in grade school with a particularly narrow-minded teacher whose own horizon ends at regurgitating peeves they picked up from some so-called authority somewhere.

In that first unlinked article, you find John Brandon literally writing this:

Another interesting discovery: I’m a champion of active voice. I was educated about the problem long ago. (Oh crap, there it is again.)

I mean, srsly?! Think about that for a moment. What should the poor guy write instead – Ms Ribbontuft educated me about the problem long ago? Who cares? The passive voice is English’s gift to you. One of many scenarios when it comes in handy is when the agent is truly irrelevant. I can’t for the life of me see how making the actor (or actress) known in that sentence would be an improvement.

The saddest part is how Mr Brandon has apparently internalized this particular peeve without giving it any genuine thought – as evidenced by his parenthetical remark. Or maybe he hasn’t; after all, this is most likely native advertising.

English has its own manageable set of language peeves. These have been perpetuated over the decades by self-appointed guardians of style, typically White Males who are by now mostly dead. Split infinitives, singular they, passive voice: the list of perfectly legit constructions that have been in use for centuries goes on. These peeves have all been taken down by people who actually know what they’re talking about.

Computers don’t know squat about language

Now that is not completely true. Computers do know squat about language, but that is a squat in a very narrow and uncomfortable slot right there, somewhere between a rock and a hard place.

Example. One genuine problem that plagues a lot of writing is dangling participles, and their close relative, dangling prepositional phrases. The likes of:

Entering the room, the lights went off.
I hit the man with the flower.

These are garden-variety ambiguities. Not ungrammatical, arguably not even bad style; simply confusing. Often you don’t even notice them: your brain doesn’t only parse the sentence without you knowing it, it also filters out the unlikely interpretations.

Problem is, computers cannot even properly parse sentences for their syntax, not to mention building a semantic model and comparing that against their knowledge of the world.

All the things that truly distinguish good writing from bad are beyond the grasp of machines, and that situation is unlikely to change in our lifetime. If it did, we would be having heated debates with our opinionated word processor about the finer points of grammar and semantics. Those debates, for now, are reserved for humans.

Spell checking at least used to work. Used to.

And this is where it gets super annoying; I-want-to-bang-my-head-against-the-wall-then-crawl-under-a-rock-and-weep-all-night annoying.

A few versions back Word’s spell checking for Hungarian was really good. I know because I used to work for the company that built the technology behind it. Like Finnish or Turkish, Hungarian is an agglutinative language where a single word can have hundreds or thousands of forms. Which variant of which building block can follow the ones before it is governed by an intricate set of morphological rules. This company’s software was really good at dealing with that complexity, and the lexicon behind it was meticulously developed by this brilliant guy sitting two doors down the corridor from me. I loved Word’s Hungarian spell checker; it beat OpenOffice and everything else by a mile.

Then Microsoft decided to standardize. They bought the technology lock, stock and barrel, put it in a meat grinder locked up in a black box, and released Office 2010. I have no clue what they did, but Word’s spell checker sucks now. It marks tons of completely legit words as errors, and it fails to recognize real typos left and right. For a while, my old employer could still sell its own spell checker as a plugin, but in Office 2013 even that possibility has been removed.

So dear Microsoft: May the dearest and most precious of your infinitives be split for this. May their splintered parts burn in hell, wailing in the passivest of voices, dangling by their ambiguous modifiers, irregardless of the process efficiencies you achieved through your infelicitous move.

Go back to the simple things that computers can do well, and do them well. Leave good writing to humans, and get the hell out of the language peeving business.

Technology is not amoral

But this is part of a broader thing that goes beyond spell checkers and grammar peeving. Let me throw in a completely different angle.

A few months ago a different piece of tech news was occupying my Twitter feed. The era of pre-crime? Chinese AI can tell you’re a criminal by looking at your photo, the clickbait title read. No, dear reader, no link.

This was scary on a different level, as I’m sure you can immediately see. It is somehow just a very Nazi thing to start deriving character traits from facial measurements. Adding “machine learning” or “artificial intelligence” (pick your preferred buzzword) as a proxy is not nearly enough to remove the creep. Soon enough, this system was taken to pieces: for one nice takedown, check How a Machine Learns Prejudice by the Scientific American’s Jesse Emspak.

What does this have to do with something as innocuous as an authoring assistant? A whole lot, I believe. Technology is not amoral. Increasingly, businesses and governments are making decisions about my life and your life using software tools designed to crunch data, aka AI aka deep learning. These tools have been demonstrated to learn and perpetuate the unconscious biases of the people creating them, and of the people curating their datasets. Calling yourself an engineer is a good excuse to pretend it’s about data and not about ethics, but that is simply not true.

Language peeves are one of the last unquestioned means for the privileged to assert their socio-economic status (think DWM). When the de-facto standard word processor codifies these prejudices, that is not an ethically neutral little tool to improve your writing. It is a minefield of conscious and unconscious bias disguised as technology.

So how do I improve my writing?

But let’s return to Planet Earth for a short closing note. After all, you came here expecting to read about language, not about class struggle.

Here is one battle-tested to improve your writing. Gather a bunch of people you trust, stand up, and read your text. Watch their faces, and listen to their feedback. Maybe, if you get the chance, give them a double-spaced printed copy and ask them to line edit, and ask them to explain each change. Do the reverse too: review other people’s writing, and think how you can suggest improvements in a way that’s honest but does not offend.

And go read Steven Pinkers’s The Sense of Style: The Thinking Person's Guide to Writing in the 21st Century.