I recently bought NaturallySpeaking, a program that does voice-to-text speech recognition. It’s owned by Nuance now, but the Dragon Systems technology has been bought and sold multiple times since work started on it in the 1980s. The latest versions (I bought Preferred version 9) have been getting consistently good reviews and I have a lot of text to enter, so I decided to take the plunge.

Sure enough, this software is almost disturbingly good. I picked up a book on alchemy that happened to be on my desk and read the following:

Standing between science and art, philosophy and religion, the mysterious practice of alchemy has long been cloaked in a veil of mystery. To this day, scholars are unsure of the precise origins of this esoteric craft, the forerunner of modern chemistry, which reached its peak in the Renaissance.

It didn’t make a single mistake in that passage.

As part of the training process for using the software they have you read one of several passages. I chose John F. Kennedy’s inaugural address. I was sorely tempted to read it in a funny John F. Kennedy voice, but the downside was obvious: I would have been obligated to speak in a funny John F. Kennedy voice every time I wanted to dictate something, and my fellow Americans, that is something that this administration cannot condone and will not support.

Some observations after using NaturallySpeaking for a few days: Even when dictation is 99.9% correct, you really want it to be 100% correct. I was also surprised to see just how much time I spend in editing, formatting, and fiddling with text, which is to say stuff that voice commands are not so good at. Voice is now good for laying down text in big blocks, but not so good at spatial fiddling. Even when you’re sitting next to a human who understands your words perfectly, it takes a lot of work for them to understand your intent. If you’re telling them how to use a GUI, you quickly end up in tech support hell, saying things like: “Double click on that. Not there… over there, just above that red thing. No, to the left! Farther left! Oh crap, you just launched Visual Studio. Here, give me the mouse.”

Another funny thing is that since the contextual understanding is so good these days, the errors that you do get are harder to spot in a quick proofing pass. In a world of clever machines, be grateful for obvious miss steaks. Ha ha. Just kitten.

  1. A long time ago, when I was using Dragon’s first speech recognizer (disconnected words), I labelled the mistakes it made in my text as “speechos.” I think it’s still appropriate.

