Facebook Patents Alternative to ‘Expensive’ BLEU

Facebook recently patented its process for measuring machine translation quality. The patent was granted on July 30, 2019, almost one year after the application date in September 2018.

Four inventors are listed on the patent, which is entitled “Optimizing Machine Translations for User Engagement”: Ying Zhang, Fei Hung, Kay Rottmann and Necip Fazil Ayan. Ayan is Research Lead at Facebook AI, and heads up the Language and Translation Technologies team.

The patent relates to Facebook’s method of gathering user engagement data on machine translation output and using this data to improve the quality of its machine translations. Translations appear in a news feed, in a banner or in an ad, for example. They may be automatically displayed or hidden until the user requests a translation.

The idea hinges on the premise that the more people like, share and comment on a machine translated post, the better the translation is assessed to be. If you then add weightings and normalize the results, user engagement can serve as a proxy by which to evaluate machine translation quality.

Inside the Translation System

Facebook says that suitable options for underlying translation systems include the “PanDoRA” system developed at Mobile Technologies, LLC, as well as machine translation systems developed by IBM Corporation, SRI, BBN or at Aachen University.” The same companies also offer automatic speech recognition (ASR), which Facebook may use to convert audio input to text.

Facebook said in the patent that it displays different translations to different groups of users, called candidate translations. How one candidate translation performs relative to another tells Facebook which one is preferred by users, and therefore better.

From here, Facebook can then tweak its models to ensure that the preferred translation is favored, making it more likely to be used in future. Facebook explains the iterative process as being “repeatedly applied in order to create a feedback system in which multiple candidate translations are generated using a model, the translations are evaluated for user engagement, the model is modified to favor the translation having greater positive engagement, the updated model generates multiple candidate translations, and the process repeats.”

Beyond assessing which candidate translations are preferred, Facebook may also be able to tell which groups of people prefer which translations. For example, since Facebook often holds information about a user’s age, gender and nationality, it may calculate engagement scores on this basis. According to Facebook, “different translations may be generated based on the language patterns of different demographic groups, and an appropriate translation may be provided based upon an identity of a user requesting the translation, or a target group identified in the translation request.”

Usability Not Ratings

In some cases, Facebook shows prompts to users, asking them to say whether a translation is usable or understandable. The emphasis on usability rather than ratings is intentional: the patent inventors found that “asking a user to ‘rate’ a translation often yields inconsistent results, because a user may not know on what basis they should be rating the translation.” By contrast, “asking a user whether a translation was ‘useable’ or ‘understandable’ produced more consistent and more useful results.”

With its method, Facebook is aiming to simplify the usually difficult and time-consuming process of “identifying which translations are favored and communicating this information in a way that a machine translation system can consistently apply.” It is an alternative to the BLEU score, which the Facebook patent points out has “several problems.”

Slator 2019 Language Industry Market Report

Data and Research

33 pages. Total market size, key verticals, services & tech landscape, market share by segment, M&A, and outlook.

While Facebook acknowledges that BLEU remains the “industry standard in evaluating machine translations,” it is problematic for a number of reasons: BLEU is expensive because it relies on human-produced reference translations and “there are questions as to how well the BLEU score measures translation quality,” Facebook said. For example, “the BLEU score may not accurately capture whole sentence-level meaning, does not address grammatical correctness, and has difficulty evaluating translations involving languages that lack clear word-level boundaries.”

Skeptical About This Direction

Slator reached out to prolific machine translation researcher Rico Sennrich, Lecturer in Machine Learning at the University of Edinburgh, for his assessment of Facebook’s newly patented quality evaluation system.

Sennrich said that he was skeptical about using big data and user engagement to optimize MT. “Social media platforms and search engines have managed to show users more relevant content by optimizing for user engagement. I understand why there’s interest in using user engagement also to optimize MT – big platforms get this data essentially for free, and it aligns with their business objectives, but I’m skeptical about this direction,” he commented.

Explaining his position, Sennrich added that “I’m happy to believe that improving the translation quality will lead to higher user engagement on the platform. I’m less inclined to believe that optimizing user engagement directly will lead to better translation quality.”

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

Data and Research

32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations

Moreover, he said, “with user engagement as its main objective, there is a risk that translation systems will learn to produce text that maximizes user engagement while sacrificing translation accuracy. To give an example, when translating product descriptions in an online marketplace, naively using sales as the optimization criterion for an MT system could reward the system for embellishing a product and misleading users, rather than translating the description accurately.”

Source link