Translation is the oldest task in humanitarian information processing and it is still the largest. Before digital technologies existed, ground-truth information needed to be translated into the language(s) of aid-workers and this is true now more than ever as the volume of digital communications rapidly increases. The need for real-time cross-linguistic communications hasn’t changed, but the technologies that support it have changed dramatically. The last 12 months have seen what might have been the greatest change to date.
Despite being the largest form of information processing, translation is not a widely discussed aspect of crisis response and is seen as “a perennial hidden issue”:
“Go and look at any evaluation from the last ten or fifteen years. ‘Recommendation: make effective information available to the government and the population in their own language.’ We didn’t do it … It is a consistent thing across emergencies.”
Brendan McDonald, UN OCHA, in Disaster 2.0 Report.
The exact reasons for the lack of translation are complex but one of the main reasons is that aid workers within crisis-affected regions simply don’t have the linguistic knowledge to translate important information. Even if they did, it is unlikely that they would have the spare capacity to undertake the work, as translation is a more time-intensive task than is often assumed. When Samasource compared the two in crisis-settings they found that at about 100 characters the average time to translate exceeded the average time to map a message. For the Libya Crisis Map, translation was therefore by far the biggest task. Most information began in Arabic and was translated along the way, whether through translators on the ground, at media agencies, partner humanitarian organizations or within the STBF team itself. Perhaps the SBTF could be called ‘Crisis Translation’ not ‘Crisis Mapping’ as mapping, categorization, etc can as easily be thought of as specialized cases of translation. For example, translating names of locations ‘San Francisco’ into coordinate language ‘37.48 N, Longitude 122.33 W’, or a plain-language need ‘We have no food’ into a limited vocabulary of categories ‘Request: food’.
For translation, like with other information processing tasks, the increase in cloud-based communications means that crisis-response information can now take advantage of cloud-based translation services. Cloud services can take this burden of crisis-workers in the field and (hopefully) ensure that failing to make information available in the language of crisis-affected populations is a thing of the past. For translation, both machine-translation and crowdsourced translation can be used, leveraging this perpetual information bottleneck onto the online world.
As a relatively new possibility, the field of crisis-translation is rapidly changing. Here are five important recent developments:
- Social Networks as Linguistic Networks. Facebook and LinkedIn both starting letting users list ‘languages spoken’. This change, which went by almost unnoticed in the last 12 months, is incredibly important. There has never before been a global registry of the languages that people speak. When I needed to find Kreyol speakers for Haiti last year, I had to search for the words ‘Haiti’ and ‘Kreyol’ among open Facebook groups and post for help there. When I needed to find Sindhi and Urdu speakers for Pakistan 6 months later, I reached out to colleagues at MyLanguage to in turn contact their user-base. For the first time ever we now have a public record of people that speak any given language pair. When I was looking for Arabic speakers for Libya this year, the starting point was much simpler and more direct: people who spoke Arabic in my extended social circle could immediately be identified. For smaller languages in particular (the majority of the 7000 currently spoken) the ability to immediately locate willing volunteers online will be vital.
- Machine translation. In 2003, the rapid deployment of a machine translation system for a new language was about 1 month. Clearly, 1 month is not quick enough to help automatically translate the large volumes of information that now quickly comes out of crisis-affected regions (and just as importantly, get information back to people within those regions). Researchers at Microsoft last year reduced this recently to just a few days. As I type, more than a dozen teams of the world’s top machine learning engineers are currently looking at problem of rapidly deploying machine translation systems on low-resource languages. They are evaluating their accuracy on translating Haitian Kreyol text messages to/from English. They will come together in one month to share their results at the annual workshop on machine translation in August 2011, and the results are eagerly anticipated.
- Real-time crowdsourced translation. In Denver last year, the leaders in human, machine and crowdsourced translation all met for the first time. One of the main topics we covered was translation for crisis response and social good. Imagine if the heads of Google, Microsoft and Yahoo all met for the first time, and decided to spend one of the largest chunks of time talking about how search engines can be leveraged for crisis response? We were fortunate to have the best minds working on this problem. Real-time crowdsourced translation began with Mission 4636 for Haiti in 2010. Before that no-one had used real-time crowdsourced translation anywhere: crisis-response, research, commercial use or otherwise. Even off-line crowdsourced translation was relatively new (my colleague Chris Callison-Burch is leading the way in this at Johns Hopkins). It was a baptisim of fire, but a strategy that has been repeated successfully since. Among the participants in Denver, it was agreed by professional, crowdsourced and machine-translation people alike that crowdsourced translation (the new kid on the block) was necessary for sudden-onset crises: professional translators would be immediately overworked and machine-translation engines for most languages would take time to deploy. Quite literally, we are able to give the world a voice – especially those who most need to be heard.
- Collaborative translation. Unlike crisis-mapping, the world of translation for social development is extremely large and disjointed. Innovative uses include Kiva’s management of a large pool of crowdsourced translators for the descriptions of micro-loan applications, and Meedan‘s combination of machine and human translation for creating cross-linguistic audiences (and understandings) of Arabic and English newspapers and blogs. A number of organizations are currently looking to combine many people’s efforts and create a single place where all social-good companies can post translation requests, and the translators themselves can microtask the translation in order of priority, sharing translation-memory across the initiatives. It is still under development, so watch this space*, but if successful it will represent the single greatest advancement in collaborative humanitarian information processing to date.
- Combining machine and human translation. Even a bad machine-translation might be good enough to allow some data structuring (mapping / categorization) by non-native speakers. At the very least, it can allow the structuring to occur in parallel with the more accurate human translation. At best, non-native speakers can correct obvious machine mistakes and take the burden off native speakers completely. The SBTF were the first to trial the combination of the two in a crisis setting, working with Spanish machine and human translation for the OCHA Earthquake Simulation in Colombia (the same technology and strategy has also been used for Arabic in Sudan and Japanese in Japan since). This is just a first step. By combining machine and human translation in more sophisticated ways, we should be able to further prioritize the most important information, and intelligently expand the potential workforce into non-native speakers.
There remains one way in which translation is very different to other forms of humanitarian information processing: it is rarely a transferable skill. Someone who can translate from Spanish to English can not transfer this ability to help translate from Russian to Quechua. If someone learns how to geolocate known places, monitor media reports and/or categorize messages in one crises, then they can take most of this knowledge with them to the next. This has important implications for preparedness. We can train teams of people in advance for most information processing tasks, but translation isn’t one of them. It simply wouldn’t be possible to train and have on standby enough people who spoke all 7,000 languages. This is what makes the technological and social efforts listed above so important: of all the information processing requirements in a sudden onset disaster, translation is the hardest to prepare for and yet has the greatest potential volume.
For the Standby Task Force, this also makes the members of the translation team different, but in a very vital way. If a crisis suddenly hits a region speaking Arabic, Bosnian, Chinese, Croatian, Czech, French,German, Hebrew, Italian, Japanese, Kreyol, Serbian, Spanish or Swahili,** then the SBTF speakers of those languages immediately become the most important people in the deployment, as they are the most important people to reach out for the necessary new volunteer translators within their own social and linguistic networks.
* For a growing list/wiki of translation services, see: http://sync.in/gVP7cNX7Eo
**I took these languages from those spoken by the people I know and/or who have commented on the Translators ning page (Ali, Amadou, Anahi, Ben, Boris, Carol, Eliana, Helena, Jaro, Jeremy,Juan, Mariah, Marta, Martin, Sebastian, Svend). Please do leave a comment with your own language(s) if you haven’t already!