Why are voice assistants still bad for multilingual users?
Or: how Google only fulfils half of its mission.
We’re in the car on a lazy Sunday, driving through the Yokohama suburbs on a little outing. The kids are asleep. The dashboard monitor — set to Google Maps on my iPhone via CarPlay — says that it’s around 30 mins to home, but we think the kids will probably sleep for an hour more. Shall we kill some time at a hardware store? We’re doing a bit of DIY and want to estimate the cost of materials for a wooden deck.
What we want is a ホームセンター (a “home centre,” a Japanese neologism that describes something much like a B&Q in the UK, a large warehouse-style hardware store – see fig. 1). We want to see options and prices for bricks, 2x4 wood, MDF, and some tools for our mountain shed.
But I face a dilemma. My phone language is set to English, and so my Google Maps app through Apple CarPlay, too.
Why is that a problem?
“Just say ‘Google, show me hardware stores’ and be done with it,”
I hear you say.
The issue is that I know from past experience that such a search is far more likely to yield results for small neighborhood tool shops, typically called a 金物屋 (a “metal tools shop” – see fig. 2).
Both of the above stores would be called a “hardware store” if you were describing them in English, or asking a voice assistant to direct you to one.
I’m sure there are memes out there, about how monolingual people often don’t realise how rarely languages map to each other cleanly, in a 1:1 sense. (truth in tweet form)
Multilinguals will recall schoolyard conversations in which a potty-mouthed friend will stop you in your tracks by asking 1. “how do you say ‘f**k you’ / ‘s**t for brains’ in Japanese / Chinese / Hindi?” or the even more nonsensical 2. “what’s Japanese for Scott?”. Disappointment is the only conceivable end-result for all participants of such a conversation.
For what it’s worth, the only correct answers to the above are 1. “well, that idea isn't a common insult in that culture” and 2. “Erm, it's Sukotto…” respectively. The only possible response from the original questioner to either of those disappointing answers is a resounding “cool story bro.” Not great for an 8-year-old boy’s ego.
Such is the messiness of language. Or the richness of language. “50 Words for Snow,” whatever.
Maybe I’m being overdramatic, but there is something to be said for the dangers of missing minority perspectives in design. It’s surely not the case that Google or Apple teams are solely made up of monolinguals (famously Silicon Valley has a large multicultural, specifically Asian contingent), so it seems odd how little care is paid to the problems that arise from design that doesn’t consider the imperfect semantic mapping between languages. Some of these problems can end up being quite cognitively taxing.
Solution: Wouldn’t it just be better if voice assistants could determine its language by judging what language you’re speaking when you say the wake word? All of these assistants - whether Alexa, Siri or Google Assistant - support both English and Japanese. At the very least, let me switch to Japanese with my first command (“Assistant, switch language to Japanese”) and continue with my subsequent command in the intended language, rather than switching my whole phone/app language settings (which would require me to stop the car and use the touchscreen).
There was an IME (software keyboard) for iOS that would detect your typing language automatically as you typed. It would only do that between languages that used the alphabet, so it wouldn’t work between English and Japanese, but it’s a start. We need a voice-assistant version of that, basically. (EDIT: iOS seems to do a version of that in speech-to-text dictation, which is at least a step in the right direction)
Google’s mission is “organize the world's information and make it universally accessible and useful” but imo they’re falling short on the second half of their mission.