The same thing having similar names in different parts of the world or cultures, is a problem Google search has encountered for long. So when it faced a similar challenge around Covid-19 vaccine names, the tech giant turned to its new Multitask Unified Model (MUM) tool for help.
Since AstraZeneca, CoronaVac, Moderna, Pfizer, Sputnik and other broadly distributed vaccines all have many different names all over the world, Google’s “ability to correctly identify all these names is critical to bringing people the latest trustworthy information about the vaccine”. So now the search platform uses MUM “to identify over 800 variations of vaccine names in more than 50 languages in a matter of seconds,” Pandu Nayak, Google Fellow and Vice President, Search wrote in a blog post. “After validating MUM’s findings, we applied them to Google Search so that people could find timely, high-quality information about COVID-19 vaccines worldwide.”
“MUM is a deep neural network that’s made up of transformers just like BERT (Bidirectional Encoder Representations from Transformers) was,” explains Nayak on a call. In 2018, Google launched BERT, a neural network-based technique for natural language processing (NLP) pre-training that lets anyone train their own state-of-the-art question answering system. “But in many ways MUM is very different from BERT. Like BERT it can understand language using encoders… but in addition to that it also has a stack of decoders which allows it to generate text,” Nayak says, adding that this particular architecture is a text-to-text model based on T5 architecture developed by Google Research.
Nayak says MUM was trained on a high-quality subset of the web corpus after omitting low-quality content. It was also trained on all 75 languages at the same time, “so intrinsically its a multi-lingual model”. He explains the benefits: “What that allows us to do is to generalise from data rich languages to data for languages where there are fewer documents.” Also, MUM is “intrinsically multi-modal sort of training” and can expand to images and the like.
This essentially means the model can learn in one language and disseminate the knowledge in others. Also, for Google this translates as a smart mechanism which does not have to learn separately in all languages.
The opportunities are endless. “We think of it as a platform in which different teams can use that platform for their own individual use cases. So one team might use it to improve classification, one to improve ranking and another for information extraction, and yet another to create a whole new application. So it is intrinsically multitasking…,” he underlines.
The vaccine name recognition is just the start of what could be something big. The blog post elaborates: “Our early testing indicates that not only will MUM be able to improve many aspects of our existing systems, but will also help us create completely new ways to search and explore information.”