The Cabinet of Wisdom V — Cheating on the Turing Test
The Cabinet of Wisdom
Part V: Cheating on the Turing Test
Alan Turing’s famous test for machine intelligence is often misquoted and misunderstood: Turing proposed that since thinking was a concept difficult to define, it was vain to ask whether a machine could think, but, instead, we should ask whether a machine could mimic the patterns of outward behavior routinely associated with thinking.
Turing called this “the imitation game.”
He coyly steps aside from the question of whether a machine can be designed to think, but instead addresses whether it can be designed to mimic outward signs of thought convincingly.
Please note that Mr. Turing, or so one hopes, is himself thinking when he says that thinking is difficult to define, by which, or so one assumes, he means define empirically, by measurement. So he turns to memetic patterns illustrative of thought, as if this were clearer. Note the irony of being unable to define the obvious but willing to define the obscure.
His proposed outward behavior was holding a conversation.
Turing proposed the conversation would consist of the machine working a typewriter or teletype to put ink-marks on a page. The machine would be actuated by words typed into a second keyboard by an unobservant observer, who is told he is in a conversation via teletype with a living man. The machine itself would be behind a curtain like the reverse Wizard of Oz, a humbug pretending to be a living man, such that the unobserver could only see the teletyped response.
I coin the term “unobserver” because the test is successful only when he fails to observe.
An Unreal Test
Again, instead of testing for thought, the Turing Test is testing for the ability of an observer to deceive himself into thinking words written by an absent author, presented in new combinations by a machine according to certain inputs typed by the observer, are written by a present author.
The test specifies typed rather than spoken words in order to minimize those clues or tells of intonation, tone, accent, and timing which observers normally use to distinguish live speakers from recordings.
Please note the exact steps of the test:
First, the machine will be engineered so that words typed by an unobservant observer will be rendered into punch-card marks or some similar thing able to actuate a machine. Nowadays we might use electrical circuits opening and closing, but the principle is the same.
Second, something like a cunning clockwork will turn gears and wheels in response into whatever pattern the engineer has correctly anticipated will mimic a meaningful response. Nowadays we might use computing software, but the principle is the same.
Third, the machine in response will then actuate keyboard keys to make ink-marks. Nowadays we might use a computer screen, but the principle is the same.
The response is meant by the engineer to be such that the unobserver, seeing the ink-marks, will mistake them for words.
The test is successful when the unobserver assumes wrongly that these ink-marks are words are made by a live and present author who is writing in intelligent and voluntary response to the meaning of the unobserver’s initial typed comments.
The test fails if he realizes what they actually are, namely, meaningless ink-marks made via mechanical actuators that his own initial typewriter key combinations set in motion.
The Turing Test is, in effect, an attempt to make inanimate, involuntary machine motions mimic voluntary decisions of a living man.
Let us propose a simple example for the purposes of illustration: the unobserver is told “The man behind the screen is a curmudgeonly old crank who is trying to sleep. Please type a message to him.”
Then the clockwork is set so that, if an odd number of keys are pressed (including spaces and punctuation), the message appears on the teletype: “Go away. I’m sleeping”, and if an even number, then: “I don’t get what you are saying. Say it again.”
Obviously, the more unobservant the unobserver is, the longer the conversation can go. Naturally, some responses will seem to the unobserver more lifelike than others.
“Good Morning. How are you?” provokes: “Go away. I’m sleeping.” But so does, “Your wife is hurt!” Whereas ” Supercalifragilisticexpialidocious!” provokes “I don’t get what you are saying. Say it again.” but so does “Wake up!”
Now, Turing then concludes that whatever test we mortal men in real life normally use to distinguish images from reality, recordings from live speech, and mannequins from men, is what we will use here to fail to distinguish mechanical responses from voluntary responses.
However, please note that the Turing Test fails to mention what the test itself is.
He just assumes you already know the test, because you stop talking to pull-string voice puppets after its supply of phrases is exhausted, and yelling at answering machines is no fun, even the ones with interactive menus.
We all use this test. There are no magnitudes of any matter in motion being measured by yardstick, stopwatch, or balance scales, hence it is not a scientific test.
It is, of course, legal test. It is a test of competence.
This test has been long established in law, even if there are four different variations of it. The oldest is the M’Naghten test. It holds that, if, at the time of the act, the accused was laboring under such a defect of reason, from disease of the mind, as not to know the nature and quality of the act he was doing or, if he did know it, that he did not know what he was doing was wrong, then he is not responsible for the act.
For similar reasons, children under the age of reason, elders afflicted with senility, or persons otherwise incapable of forming contracts or standing trial are not treated as responsible individuals at law.
But this legal test is used only when there is a question of sanity and competence. There may be young children who cannot tell the difference between a pull-string talking puppet and a living being, but I rather suspect the children are playing pretend, much as any adult does when he reads a story or watches a drama: we suspend our disbelief to allow our emotions freedom to react to unreal symbols and images as if they were real. But this suspension of disbelief only goes so far: rarely do theater-goers leap up on stage during a Shakespeare play to prevent the murder of Hamlet’s father by his evil uncle, which is what any decent man would do if he literally believed the events were real.
The imitation game is logically circular. It measures nothing. It asks whether an engineer’s design of a machine’s involuntary reactions can mimic voluntary acts with sufficiently clever mimicry to mimic them. It is a tautology: whatever mimics human thought can mimic human thought.
Very good. But what properties must the speech pattern display in order to mimic human thought? Coyly, the Turing Test does not specify what traits of speech are sufficient to mimic human speech because it does not specify the nature of human speech.
The Turning Test itself, ironically, is a word game that sounds like human thinking, but is not.
But let us be fair. A book by a dead author is indeed an outward sign of intelligence. An undirected or involuntary natural process cannot give rise to such a thing. The Turing Test is meant to test for something else: the imitation of intelligent reactions, that is, an automatic behavior that adjusts reactions, eliminating some or prioritizing others.
The Turing Test is an imitation game. The machine that responds to typed inputs by the unobserver is not responding in a simple nor predictable way, and it responds in ways that keep track of prior responses, and eliminates possible responses or prioritizes other accordingly.
This is what might be called machine learning. What it is, and how it differs from thinking is the next question to address.