Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English
A team of Microsoft researchers said
Wednesday that they believe they have created the first machine translation
system that can translate sentences of news articles from Chinese to English
with the same quality and accuracy as a person.
Researchers in the company’s Asia and U.S. labs said that
their system achieved human parity on
a commonly used test set of news stories, called newstest2017, which was developed by a group of industry
and academic partners and released at a research conference called WMT17 last
fall. To ensure the results were both
accurate and on par with what people would have done, the team hired external bilingual
human evaluators, who compared Microsoft’s results to two independently
produced human reference translations.
Xuedong Huang, a technical fellow in charge of
Microsoft’s speech, natural language and machine translation efforts, called it
a major milestone in one of the most challenging natural language processing
tasks. “Hitting human parity in a machine translation task is a dream that all
of us have had,” Huang said. “We just didn’t realize we’d be able to hit it so
soon.”
Huang, who also led the group that recently achieved human parity in a conversational
speech recognition task, said the translation milestone was
especially gratifying because of the possibilities it has for helping people
understand each other better. “The pursuit of removing language barriers to
help people communicate better is fantastic,” he said. “It’s very, very
rewarding.”
Machine translation is a problem researchers have worked
on for decades – and, experts say, for much of that time many believed human
parity could never be achieved. Still, the researchers cautioned that the
milestone does not mean that machine translation is a solved problem.
Ming Zhou, assistant managing director of Microsoft
Research Asia and head of a natural language processing group that worked on
the project, said that the team was thrilled to achieve the human parity
milestone on the dataset. But he cautioned that there are still many challenges
ahead, such as testing the system on real-time news stories.
Arul Menezes, partner research manager of Microsoft’s
machine translation team, said the team set out to prove that its systems could
perform about as well as a person when it used a language pair – Chinese and
English – for which there is a lot of data, on a test set that includes the
more commonplace vocabulary of general interest news stories.
"Given the best-case situation as far as data and availability of resources goes, we wanted to find out if we could actually match the performance of a professional human translator," said Menezes who helped lead the project.
Menezes said the research team can apply the technical
breakthroughs they made for this achievement to Microsoft’s commercially
available translation products in multiple languages. That will pave the way
for more accurate and natural-sounding translations across other languages and
for texts with more complex or niche vocabulary.
DUAL LEARNING, DELIBERATION, JOINT TRAINING
AND AGREEMENT REGULARIZATION
Although academic and industry researchers have worked on
translation for years, they’ve recently achieved substantial breakthroughs by
using a method of training AI systems called deep neural networks. That has
allowed them to create more fluent, natural-sounding translations that take
into account an even broader context than the previous approach, known as
statistical machine translation.
To reach the human parity milestone on this dataset,
three research teams in Microsoft’s Beijing and Redmond, Washington, research
labs worked together to add a number of other training methods that would make
the system more fluent and accurate. In many cases, these new methods mimic how
people improve their own work iteratively, by going over it again and again
until they get it right.
“Much of our research is really inspired by how we humans
do things,” said Tie-Yan Liu, a principal research manager with Microsoft
Research Asia in Beijing, who leads a machine learning team that worked on this
project.
One method they used is dual learning.
Think of this as a way of fact-checking the system’s work: Every time they sent
a sentence through the system to be translated from Chinese to English, the
research team also translated it back from English to Chinese. That’s similar
to what people might do to make sure that their automated translations were
accurate, and it allowed the system to refine and learn from its own mistakes.
Dual learning, which was developed by the Microsoft research team, also can be
used to improve results in other AI tasks.
Another method, called deliberation networks, is
similar to how people edit and revise their own writing by going through it
again and again. The researchers taught the system to repeat the process of
translating the same sentence over and over, gradually refining and improving
the response.
The researchers also developed two new techniques to
improve the accuracy of their translations, Zhou said.
One technique, called joint training,
was used to iteratively boost the English-to-Chinese and Chinese-to-English
translation systems. With this method, the English-to-Chinese translation
system translates new English sentences into Chinese in order to obtain new
sentence pairs. Those are then used to augment the training dataset that is
going in the opposite direction, from Chinese to English. The same procedure is
then applied in the other direction. As they converge, the performance of both
systems improves.
Another technique is called agreement regularization.
With this method, the translation can be generated by having the system read
from left to right or from right to left. If these two translation techniques
generate the same translation, the result is considered more trustworthy than
if they don’t get the same results. The method is used to encourage the systems
to generate a consensus translation.
Zhou said he expects these methods and techniques to be
useful for improving machine translation in other languages and situations as
well. He said they also could be used to make other AI breakthroughs beyond
translation. “This is an area where machine translation research can apply to
the whole field of AI research,” he said.
NO ‘RIGHT’ ANSWER
The test set the team used to reach the human parity
milestone includes about 2,000 sentences from a sample of online newspapers
that have been professionally translated.
Microsoft ran multiple evaluation rounds on the test set,
randomly selecting hundreds of translations for evaluation each time. To verify
that Microsoft’s machine translation was as good as a person’s translation, the
company went beyond the specifications of the test set and hired a group of
outside bilingual language consultants to compare Microsoft’s results against
manually produced human translations.
The method of verifying the results highlights the
complexity of teaching systems to translate accurately. With other tasks, such
as speech recognition, it’s pretty straightforward to tell if a system is
performing as well as a person, because the ideal result will be the exact same
for a person and a machine. Researchers call that a pattern recognition task.
With translation, there’s more nuance. Even two fluent
human translators might translate the exact same sentence slightly differently,
and neither would be wrong. That’s because there’s more than one “right” way to
say the same thing.
“Machine translation is much more complex than a pure
pattern recognition task,” Zhou said. “People can use different words to
express the exact same thing, but you cannot necessarily say which one is
better.”
The researchers say that complexity is what makes machine
translation such a challenging problem, but also such a rewarding one.
Liu said no one knows whether machine translation systems
will ever get good enough to translate any text in any language pair with the
accuracy and lyricism of a human translator. But, he said, these recent
breakthroughs allow the teams to move on to the next big steps toward that goal
and other big AI achievements, such as reaching human parity in
speech-to-speech translation.
“What we can predict is that definitely we will do better
and better,” Liu said.
0 comments