Voice UX: Are We Ready?

A Primer on Conversation Design Frameworks

Felicia Van Every • • 9 min read
Two cartoony and colorful chat bubbles connected with a digital voice icon.

Conversation design is a growing discipline that stems from voice experiences used in everyday life. Much of the population is already using IoT devices like smart speakers. In addition to keeping a running grocery list, receiving up-to-the minute weather reports, and getting package deliveries notifications, we are now seeing voice assistant technology designed around the classic board game experience (e.g., “When in Rome”). Now, thanks to Dom from Dominos, you can order a pizza via voice from your car as you are driving home, making dinner decisions even easier. We are seeing continued improvement in the AI-powered voice-enabled chatbox world, which is where we will focus our discussion. As this technology takes off, it’s essential we understand how it came about, how it works and what’s important to consider when choosing a chatbot framework for your organization. 

Chatbots have been around for some time and started with the Turning Test in 1950 and Eliza in 1966. These experiments focused on how well bots could mimic human-to-human interaction. AI-powered chat has continued to develop throughout the decades with incremental improvements, until about 2018 when some thought it might hit its stride with Google Duplex. Google Duplex was able to convert a machine-like conversation into a more human-centered conversation. However, it only applied in certain use cases with much improvement still needed.

Text Based vs. Voice Enabled

Before we discuss how it works, let’s discuss the main difference between classic text-based chatbots and voice-enabled chatbots. Essentially, this comes down to the modality being used. One uses text on the screen and the other uses spoken word. Although both use machine learning and natural language processing to understand the user, the voice experience mimics everyday language and conversation. 
Both types of chatbots have become increasingly popular. In 2021, most brands focused on developing their chatbots better versus the development of their mobile application, as stated in a 2022 Gartner study. It’s estimated that by 2024 the consumer retail spend via chatbots worldwide will reach $142 billion—up from just $2.8 billion in 2019, according to Juniper Research.

Four tear-dropped shaped containers each with a chatbot market datapoint: 1) Chatbot market size is expected to grow from 2.6B in 2019 to 10B by end of 2024. From Notify Visitors Summary. 2) 67% of businesses believe chatbots will diminish usage of mobile apps. From Gartner. 3) In 2020-21, use of chatbots rose by 130%. 50% of consumers prefer shopping with brands who use chat. From Juniper Research. 4) UX Magazine named Conversation Designer the fastest growing role in 2020. From Hilary Black at UXMAG.COM.

We are now seeing major growth in voice-enabled chatbots. According to Veriloop.io, “The global voice recognition market is on the mark to boom. Statista shows a whopping 154% jump in the voice AI market share from 2020 to 2026.”

With this new technology taking over the market, will your organization capitalize on this new customer engagement channel?

Essential Questions for Organizations

In Kelly Goto’s 2021 presentation “Hello Computer?” Voice UX, Chatbots and Voice User Interface at the dmi: Design Leadership Conference, she covers the following essential questions that should be answered by organizations when considering implementing voice within their organization or product lines:

  • What’s happening with voice, what do we need to consider for our product line?
  • What are the tools and technologies we need to understand? 
  • Are we ready to consider a Voice UX strategy? 
  • Do we have the resources to execute?

With so much investment in chatbots and so many platforms to choose from, which framework would work best for your organization? Goto points out two main frameworks:

  1. A rules-based approach that uses a predetermined script (automatic text conversations / directed dialog). 
  2. An intent-based approach that uses Natural Language Processing (NLP) + AI / ML (Machine Learning). This may also be referred to as Natural Conversation / Multi-Turn Dialog. 

Let’s take a look at these two approaches more closely. When deciding whether rules-based vs intent-based makes the most sense for your use case, Goto suggests, “concentrate on experiences you can actually control when choosing your approach.” 

Rules-Based approach

The rules-based approach creates a decision tree to help steer the user to the correct resolution. This is a very rudimentary example of a controlled-based conversation. All answers have been pre-determined using if/then logic. With each additional question, the user is conceivably getting closer to the intended resolution. However, this approach does not understand context and is time consuming for users to express intent through a series of directed dialog questions. This type of framework can be useful for simple interactions but may result in a lot of effort to ensure the user gets the resolution they are looking for in more complex interactions.

NLP-Based Approach

On the other hand, AI-powered bots are typically not using true AI and instead use natural language processing and machine learning. This type of bot is learning through each interaction to try to understand the user’s intent.

When deciding whether rules-based vs intent-based makes the most sense for your use case, Goto suggests, “concentrate on experiences you can actually control when choosing your approach.” 

There have been some instances where AI-Enabled Chatbots have been out of control. Deep learning was used in conjunction with NLP and ML in these cases. Take the example of IBM Watson, where it learned the entire Urban Dictionary and could not distinguish when using inappropriate language. 

Developing a Voice UX Strategy

Once the bot framework has been determined, the next step is to consider the best strategy for your Voice UX. Goto suggests focusing on four main elements: 

  1. Understanding your users’ intent: what are their goals and needs?
  2. Pinpointing where your users are in the journey: Typically based on a deliberate task or activity.
  3. Consider the brand voice and personality your organization wants to convey.
  4. Determine the ideal outcome for what your organization is trying to accomplish.

Determine Which Tools To Use

Once the framework and strategy are in place, determine which tools will work best. There are over 1000+ tools on the market today for this type of technology, all with various integrations. Goto suggests, “get in there, use a tool, then figure it out.” With so many tools on the market, much evaluation time will be spent, so jump in and get your feet wet. 

Diagram that illustrates defining the natural language intent in order to determine the best platform to use.
With so many tools on the market, much time evaluation time will be spent, so jump in and get your feet wet.
Diagram that illustrates various platforms and tools that may be used based on the defined intent.

Hurdles to Plan For

Consider the resources that may be needed for this type of project up front. In most cases you will need someone familiar with coding and flow charts to execute a basic chatbot. With an NLP voicebot, you will need well-versed developers who have experience configuring NLP and likely experience with Google and Amazon. We have heard that one of the top hurdles for companies diving into the voice UX world is the lack of resources on their teams to execute. Here are examples of the type of conversational design roles you will see in the Voice UX space. 

  • VUI – Voice User Interface
  • Voice Interaction Design
  • Voice UX
  • Conversation Design
  • Conversational AI
  • Virtual Assistant Design

No Matter the Framework, Design For Context

Regardless of which framework you decide is best for your project, ensure you know the context when embarking on a voice experience with a user. Make sure you understand if voice or text is more appropriate for your users’ current situation, or if a combination of both would be applicable. Consider where your user is in their journey. If the user asks for help with a topic, don’t send them the same information they may have already seen or are currently looking at. Consider these bad chatbot experiences and try to avoid similar situations with voicebots. 

Voice UX and Conversation Design won’t be going away so now is the time for your organization to consider the right approaches when incorporating this technology. Do not let this complex frontier slow you down. Get in there and figure it out.