Speech odyssey: Language technology innovations in the EHR

Preorder 2025 Data Sets and be the first to experience the new MGMA DataDive

Insight Article

Home > Articles > Article

6/10/2019

Did you hear? Amazon’s Alexa is HIPAA compliant now.

That announcement in early April set off a flurry of headlines about what’s next for healthcare organizations looking to tap into the popularity and power of consumer voice assistants. Alexa developers are already pointing to six healthcare organizations offering voice-enabled skills through the device’s service, including urgent care location lookup and same-day appointment scheduling.1

While the excitement generated by the nation’s largest e-retailer got plenty of attention, the evolution of voice as a key innovation in healthcare has been in the works for some time and was a recurring topic of discussion in February at the HIMSS19 Global Conference & Exhibition in Orlando, Fla.

As a member of the HIMSS Innovation Committee, Santosh Mohan, MMCi, FHIMSS, head of More Disruption Please (MDP) Labs for athenahealth, says that voice is a big focus area for those he works with because the technologies being developed around it promise to alleviate burdens on clinicians and offer them more data at the moment of care “with an interface that is so natural to all of us.”

Use cases of voice technology being developed in healthcare include more consumer-style smart speaker commands, such as being able to control the lights, TV or shades within a hospital room, as well as initiating a nurse call without pressing a button.

Others go much further into the realm of practice management and core IT systems used by providers daily.

Voice in the EHR

Yaa Kumah-Crystal, MD, MPH, assistant professor of biomedical informatics and pediatric endocrinology, Vanderbilt University Medical Center, grew up watching “Star Trek” and “Knight Rider.”

“The idea of talking to a computer has just made sense to me for a very long time. … while that’s existed in science fiction — and it’s unclear about whether life’s imitating art or art imitating life — it’s new for us in the medical space,” she says.

Kumah-Crystal says that while EHR systems have a lot of promise, most medical practices are not “letting it do the things that would be really helpful to us” — most notably, leveraging voice technology.

Vanderbilt’s medical center already uses an EHR with natural language processing (NLP) technology to assist in dictation by providers. Taking the next step to developing a system in which similar technology opens up the EHR for voice commands and interactions just seemed natural, she says.

“Voice is the most natural way of communicating that we have as human beings. Before there were EHRs, before paper and pencil, we would communicate with our voices … to ask for the things we want, to get information back,” Kumah-Crystal says.

As a communication modality, voice is often superior: Longhand writing and typing range between 13 words per minute to about 100 for a speedy typist, but “that experience takes you away from everything else you’re doing,” Kumah-Crystal notes, whereas speech is usually around 150 words per minute.

Limitations

Voice technologies are not perfect, Kumah-Crystal admits. Even the much-used Siri assistant on Apple can sometimes have difficulty trying to play music from a library on a device. However, this is not simply an issue with technology recognizing speech — often the words are recognized verbatim. “What’s missing — the gap — is the understanding, the intent,” Kumah-Crystal explains.

Some providers who have used macros for years might wonder what’s so great about voice commands, since macros approximate that in many ways. Yet “the biggest limitation with macros is [they] save a very specific voiceprint, the intonations of what you say,” Kumah-Crystal says.

So if a provider records a macro to “insert physical exam,” saying the same words with a different intonation or a slight variation (“insert the physical exam,” for example) will not work. “We say things different ways. We pause, we think before the next words, we st-st-stutter,” Kumah-Crystal said with emphasis.

NLP technology achieves a much more advanced speech recognition, but that comes with its own set of considerations, including latency, or the speed it takes to get a response. “Because we use voice all the time, we have expectations of how voice should perform,” Kumah-Crystal notes. During a delay for information, someone might be thinking, “I could have just found this information myself,” which could hamper adoption.

Additionally, the push for interoperability in health IT systems is a key focus for voice technology. The team working on Vanderbilt’s system is using FHIR — Fast Healthcare Interoperability Resources — as much as possible. “You need to be able to pick it up and plug it in and go no matter where you are, because you care about the whole picture of the patient, not just what lives in your EHR,” Kumah-Crystal says.

Another drawback is that machines cannot innately convey what a human might in speech. Speech markup languages can be used to add intonation emphasis to a voice assistant, but it’s much more work than a resident saying a patient’s blood pressure is 150 with a clear sense of urgency that something is amiss.

Similarly, there is a balancing act between brevity and clarity for voice technology. When originally programmed, the Vanderbilt system would read out a patient’s blood pressure as “120 millimeters of mercury over 80 millimeters of mercury.”

“That is ‘millimeters of mercury,’” Kumah-Crystal emphasizes. “Eight syllables. That’s a mouthful … it’s ‘120 over 80,’ that’s how we say it.” Simplifying the voice assistant to natural language then brings up the question: Should it say, “one hundred twenty,” or just, “one twenty”? All of this goes into the development of a system that matches what a provider hopes to get from it rather than hunting through the EHR’s interface.

Much of the work focuses on what Kumah-Crystal calls “the uncanny valley of words.” Most speakers don’t recognize the grammatical nuances of ordering multiple adjectives:

Quantity
Opinion
Size
Age
Color
Material
Purpose

“I can say, ‘I ate three big, red apples.’ … But if I said, ‘I ate red, big three apples,’” Kumah-Crystal notes, it would be confusing. “That’s the whole point about a voice user interface. If you don’t get it right, the person is too busy thinking about why it doesn’t sound right to hear the important clinical information you’re trying to convey. … how can we get information across as accurately as possible but also make it sound right and feel right to the end user?”

Where and when you say it

Tim Coffman, an application developer specializing in EPIC EHR integrations at Vanderbilt, points out that where and when the technology is used plays a big part in what is delivered. A provider preparing to see a patient in a private workroom with a large screen could have protected health information (PHI) presented aloud, though that same platform might speak the information differently once a patient is in the room with the provider.

To help providers use this technology on the go, it must also work on smaller screens and mobile devices and limit the spoken information to headphones if the provider is in a hallway or other public space, Coffman says.

This level of specificity is now being used to tailor the system to be used by patients themselves. “When a provider asks for someone’s creatinine level, they want to make decisions about medications and adjustments,” Kumah-Crystal says. “Whereas a patient is asking about their health reality — ‘am I getting better?’ What does that mean?’”

Winning over providers

If the goal of voice technology in the EHR is to make physicians’ jobs easier and reduce burdens, getting them to adopt the technology and trust it is crucial.

“Motivating us to do this is making these things easier to use. And if you identify the provider frustration with trying to find things in their EHR… that leads to creating a business case for how you can really improve their time spent focusing on the EHR versus engaging with the patient,” Coffman notes.

In a usability study at Vanderbilt’s medical center, a cohort of 14 pediatric endocrinologists evaluated six skills the platform offered, including readouts of a patient summary, A1c level, weight, blood pressure and health maintenance. Nearly two providers out of three (64%) said they would be willing to use such a system.

Following that study, Coffman says the next steps include ongoing incorporation of user feedback, focusing on scalability, alerts and decision support.

“How do you alert someone in a clinical setting about an abnormal lab or something we want to bring to their attention? Currently, we have pop-ups that say, ‘Don’t order that,’” Kumah-Crystal says. In this case, what kind of sound or voice should be used to warn a provider, for example, about a patient being allergic to amoxicillin? Another consideration: Is it appropriate for an alert to sound when a patient is present and tip him or her off to a provider possibly making a mistake?

“There’s something very special and innately magical about the way we respond to voice that we can really take advantage of, and all the tools exist,” Kumah-Crystal says. “We just have to think it through and figure out how to use it, where to plug it in and how to make it work.”

Complete the ACMPE Article Assessment