voice interaction design
i ran a 3-day workshop on voice interaction design for aman xaxa’s students of communication design (graduate studies) at the school of planning and architecture, bhopal. we functioned as a fast-moving studio: iterating rapidly while exploring multi-modality in voice-led human–machine conversations interactions.
this is a record of what everything we made (and discussed) in the workshop:
project 1: “ice breakers”
instead of asking students to introduce themselves, i decided to ask my questions (“what is your name” and “where are you from”) to their computers. following a discussion around turn-taking, students built simple interactions (sketches) with pre-programmed answers to those questions. they were introduced to voiceflow, and used the speak and intent blocks in it. they also learned about running prototypes, and about how utterances only need to match partially to trigger an intent.
- instead of typing questions (into a chat-box), testing with our voices made it harder for the machine to detect what we were saying (and helped build more robust interactions.)
- there were several ways of framing the same question. people spoke to the machines in different languages. people also asked completely unrelated questions, or asked more questions than were programmed-in. (for example: i’d ask the machine to describe the student’s education or hobbies, or spell out their names.)
- as the students tested each-other’s sketches, they recognised a need to plan how interactions ended, looped-back, and/or reset.
project 2: “slow snake game”
in this project, students worked in groups and made an interaction lasting several turns. since the intents for a snake game were easy (”turn left”, “go right”, “keep going straight”, “go up”, etc), this project focused on multi-modality (specifically: offering visual feedback besides voice/text), and opened-up discussions about personas. students were encouraged to imagine what kind of a person(ality) their game’s “snake” was, and let that be reflected in how the snake responded to intents.
some groups used earcons (further expanding their multi-modal explorations); some used filled pauses (like “hmm”) to negotiate open-ended responses from people; and there was also a discussion on using recorded voice instead of a machine-generated voices (when feasible, of-course ; for example: when there are only a few standard machine-responses, and voice samples can be quickly recorded and placed into a sketch/prototype).
while prototyping, testing and iterating, we discussed several things: the importance of affordance in context; setting and managing expectations; tapering; equipping a person to issue the right commands; giving relevant feedback (to let a person know what state the machine is in); and even some basic error handling, to enable the interaction to recover from an error elegantly.
once they’d suffered enough, i showed them how handy entities and variables can be.
we also noticed how, in a multi-modal experience (using voice, earcons, and visuals), we can trigger modes individually, together or in a sequence. for example, when offering feedback (to denote success or error), it may be more affective for the machine to play an earcon before displaying visual feedback.
this led us to a discussion about using memory to enrich interactions, instead of just building reactive or prescriptive conversational-experiences.
project 3: “car find toilet”
“let’s imagine a woman driving a car, with her child on the back-seat. they’re on a highway, and the child suddenly expresses her need to go to a toilet stat. instead of the mother checking a map on her phone (which is dangerous to do while driving), can her vehicle help by directing her to a facility nearby?”
the students wrote a brief, in 5–6 sentences, establishing context (world-building, vehicle type, persons involved, where they’re going) and defining the task the mother would want to accomplish by talking to her vehicle.
first, they performed the interaction: after writing down a “happy-flow” script, they acted it out in front of each other (with one person in a team role-playing the car, and another pretending to be a passenger).
then, they built (at-least) the happy-flow in software. students were encouraged to use sounds and images, manage some error-handling, and include at-least a few variations in the machine’s utterances.
additional resources:
in “always in” (2019), drew austin wrote—optimistically—about a future where everyone wears headphones all the time. students were asked to share their thoughts on the essay (or any specific part in it) while introducing themselves on the first day. later on, students were encouraged to read some “theory” about how machines understand what we speak to them: through an introductory article to ‘natural language understanding’ (on vux).
whatever we make is, in a way, magic (because all the complexity gets hidden away behind a seamless experience). with this in mind, i recommend genevieve bell’s talk on magic and fear and wonder and technology to any student of interaction design. also: in birth of living code: tamagotchis and teddybears (2019), anne skoug obel went through different ideas about when code/machines are perceived to be living.