Experiments in multi-modal voice AI

Bixby 2 demos and CES showcase

Uploaded by David Law on 2018-02-04.

I proposed and executed a number of increasingly ambitious voice control projects within our lab, eventually leading to a close collaboration with viv.ai, the creators of Siri. This resulted in a large shift in product strategy and architecture in the Bixby voice agent.

Goal

Advance the vision of voice UX from direct command and control to an natural language driven, hands-free assistant experience.

Role

My role in voice R&D projects evolved from being a sole project contributor driving the entire design/prototype/pitch process, to leading a global effort to bring viv.ai’s advanced NLU engine to the TV.

Result

Secured funding to pursue multiple voice related concepts, building lab expertise in voice UX.
Garnered buy in for a massive change in direction for Bixby 2.0.

Background

When Siri was first released I was amazed by the utility of their multi-modal NLU. I was inspired to build a hands-free voice assistant called “Sammy” as a side project, built from a disassembled headset and directional microphone I stole from a conference room. This was prior to the advent of Alexa and other platforms that made such prototyping far simpler.

My hacky prototype caught the attention of the “Future Innovations” team in Korea, who funded the creation of Team VIVA (Visual Voice Assistant), a team that fluctuated from 1 to 6 designers/prototypers/researchers exploring how a displays might augment the capabilities and usability of a voice assistant.

Together we used whatever tools we could find (even non-Samsung AI tools like Hound and api.ai) to develop an approach to voice + display UX. We built and tested experiences across stand alone speakers, TVs, and other non-TV displays in the home.

Multi-modal interaction

As voice assistants like Alexa and OK Google started to become ubitquitous, we became skeptical that it could support particular use cases. Being part of the TV team, this was particularly important to us; we wanted to identify unique interactions that displays could support, depending on what interaction modality (i.e. touch, remote, hands free, eyes free) the user was engaged in. We developed frameworks to help make sense of which interactions / capabilities made the most sense based on limitations of each modality. For example voice is great for Search and direct access to music controls, but a poor choice if the user wants to refine the search or browse for a particular track, which are both integral to the music listening experience.

This rationale served as the basis of aligning the company around which use cases made the most sense to serve first, narrowing the scope of our voice efforts drastically.

samsung Bixby

This approach was put to the test in 2017, when Samsung acquired VIV Labs. Team VIVA was asked to take VIV’s advanced NLU and build a voice UX for the 2018 TV.

Part of this was to overhaul the existing Bixby system; a rapidly growing mess of antiquated speech technology. The VIV team and I negotiated to break off and work in parallel to the existing plan, focusing on the entirely new technical stack built by the viv.ai team. Together we leveraged our mutual desire to showcase the different UX that their advanced NLU engine could provide. Working with viv.ai, Samsung Service Business Team, and the CX design group, my contributions included:

Led team of three interaction/visual designers
Aligning three different product teams with overlapping agendas.
Coordinating engineers from those three teams to build our vision of the ideal voice UX on TV.

Our prototype demonstrated the capabilities of a true NLU engine to our executive leadership team, using some of the complex capabilities shown in viv.ai's ground breaking Tech Crunch demo as well as new use cases around TV and Music consumption. We iterated madly and shopped our demo to multiple layers of engineering and product team review until it reached our CEO HS Kim. The story of the work and how it was completed was compelling enough that he became an advocate of our approach to replace the mobile division's failing approach to Bixby 1.0 with a simplified NLU.

Bixby is now in the capable hands of viv.ai and the korean design team and remains the primary voice assistant for all Samsung devices. It was a great privilege to work with such a talented team of AI experts and designers.

Press

Samsung Unveils Bixby 2.0 with VIV labs integration

Dag Kittlaus @ SDC