Siri May Get Smarter by Learning from Its Mistakes

Apple’s voice assistant, Siri.

Try holding even a short conversation with Siri, Cortana, or Alexa and you may end up banging your head against the nearest wall in frustration.

Voice assistants are often good at responding to simple queries, but they struggle with complicated requests or any sort of back-and-forth. This could start to change, however, as new machine-learning techniques are applied to the challenge of human-machine dialogue in the next few years.

Speaking at a major AI conference last week, Steve Young, a professor at the University of Cambridge who also works part time on Apple’s Siri team, talked about how recent advances are starting to improve dialogue systems. Young did not comment on his work at Apple but described his academic research.

Early voice assistants, including Siri, used machine learning for voice recognition but responded to language according to hard-coded rules. This is increasingly changing as machine-learning techniques are applied to parsing language (see “AI’s Language Problem”).

Young said in particular that reinforcement learning, the technique DeepMind used to build a program capable of beating one of the world’s best Go players, could help advance the state of the art significantly. Whereas AlphaGo learned by playing thousands of games against itself, and received positive reinforcement with each win, conversational agents could vary their responses and receive positive (or negative) feedback in the form of users’ actions.

“I think it’s got to be a big thing,” Young said of reinforcement learning when I spoke to him after his talk. “The most powerful asset you have is the user.”

Young said that voice assistants wouldn’t need to vary their behavior dramatically for this to have an effect. They might simply try performing an action in a slightly different way. “You can do it in a very controlled way,” he said. “You don’t have to do daft things.”

During his talk, Young explained why parsing language is so difficult for machines. Unlike image recognition, for example, language is compositional,…