Innovation: Voice in the bank

Brian Murphy was age 47 and working as a busy consultant cardiologist in Glasgow when in 2019 he was diagnosed with motor neurone disease (MND).

“I was utterly devastated,” he says. “I knew what it meant. There’s no hope of treatment, of stalling it or getting any better. So, you have to cope with thinking about that, on top of the physical deterioration.”

He had to give up his job and also break the news to his two children and partner Gillian, a cardiology nurse.

“I’ve tried my best to keep a positive attitude, especially for the kids. Trying to be positive helps you to actually feel positive. I don’t want to waste my life away thinking about MND.”

One positive step Brian took fairly early on was to make contact with SpeakUnique, an Edinburgh-based company that creates synthetic voices for use in communication aids by people who have lost, or will lose, the ability to speak as the result of a medical condition.

Capturing speech

Voice banking has been around for a number of years but advances in artificial intelligence (AI) and machine learning have led to a step-change in the technology of speech generation. To bank his voice, Brian accessed the SpeakUnique service at home on his computer.

“Basically it was a web-based application where I read a variety of sentences and then after that I got to choose a chapter of a book to read,” says Brian. “I chose the first chapter of Around the world in eighty days – and actually I enjoyed it so much I went on to read the book.”

“The whole process took about half an hour. I think I would really struggle to do that now, so I’m glad that I did then. That was about two years ago that I banked my voice. I think that it’s very important, especially in progressive disorders like motor neurone disease, that you do the capture early when you are capable of doing it.”

Not long after making the recording Brian was able to sample his synthetic voice using SpeakUnique’s own text-to-speech app that allows a user to input text on an iPad or computer via a keyboard or using pre-set categories and phrases. The message is then generated aloud in the personalised synthetic voice.

“Overall it sounds pretty much like my own voice,” says Brian. “The only thing I would say is that it’s a little bit fast. Because I haven’t needed it yet I’ve not spoken to them about having it altered. But I was pleasantly surprised.”

A father’s voice

SpeakUnique is a relatively new company, spun-out of the University of Edinburgh in 2020, according to CEO Alice Smith.

It formed as a collaboration between the School of Informatics and the Euan MacDonald Centre for Motor Neurone Disease Research. Euan MacDonald was 29 years old when he developed MND. He and his father Donald founded the centre in 2007 to fund research into MND and increase awareness of the disease.

“When Euan was told he would likely lose his voice, he said surely there is something out there that can capture your voice and preserve that,” says Alice.

“He had two young boys at the time and he didn’t want them to grow up and not know the sound of their father’s voice. Through conversations with colleagues at the university they realised there were researchers based in the Centre for Speech Technology Research who were already looking at speech synthesis and voice cloning.”

A project was established with the Anne Rowling Regenerative Neurology Clinic in Edinburgh, a specialist clinic offering routine care and research studies to improve the lives of people with neurodegenerative conditions. This allowed for direct patient interaction and feedback as the voice banking technology was developed, to make sure it was aligned to patient needs.

Eight years on SpeakUnique now offers its service to people with a range of conditions. In addition to MND, the technology can help communication in people with parkinsonism conditions, such as progressive supranuclear palsy and multiple system atrophy (MSA), and people with ataxia. Head and neck cancer patients undergoing laryngectomy or any radiation surgery that might impact speech can also benefit from voice banking.

“We have a number of charities that we partner with and who are able to provide funding for people using the service,” says Alice.

Deep neural network

Voice banking used to be expensive and time-consuming, often requiring a person to record thousands of sentences to capture every possible sound in a language. The process could take up to 30 hours. The results were choppy and robotic. Now there are several companies offering services like SpeakUnique, including CereVoice Me, ModelTalker, My-own-voice and The VoiceKeeper, using AI to generate ever-improving synthetic voices, faster and cheaper.

Alice Smith explains: “Historically when the technology was first developed it would break everything into phonemes or words and then it would match them together, sort of cut and paste almost to stich a voice together. That’s why you needed to have many more recordings. But the approach we take now uses a deep neural network approach to machine learning. It’s more sophisticated in that it learns the representation of speech rather than relying on cutting it down into every different sound.”

Neural networks aim to simulate the activity of the human brain – specifically pattern recognition and the passage of input through various layers of simulated neural connections. In voice synthesis the AI software uses deep neural networks with algorithms that have been trained on a vast library of voices to spot patterns in what makes a voice unique. It can re-create a digital version of a person’s voice that mimics qualities like pitch, tone, resonance and accent.

“Generic text-to-speech voices tend to have very limited accent representation and it definitely doesn’t capture the individual’s essence,” says Alice. “What we have developed is a platform that allows people to record a relatively short sample of themselves speaking and then use that to create the digital replica of their voice.”

To Brian Murphy this is what makes the technology so appealing.

“It keeps a bit of your personality which I think you would lose with a more robotic voice,” he says. “It’s easier for friends and relatives to relate to.”

Repairing or building a voice

Another service offered by SpeakUnique is the ability to repair a voice. This is for people wanting to bank their voice but who have already noticed a deterioration in their speech, such as slowness, slurring or a change in vocal quality.

“So if someone comes to us with symptoms in their speech, be that dysarthria due to MND, or hesitancy or disfluency, we can remove those symptoms from their synthetic voice and give them back a healthy sounding voice,” says Alice.

“Voice donors” are sometimes used to create a voice that matches a desired accent, age and gender – and patients can also be provided a synthetic voice generated from old recordings, such as voice memos or wedding speeches, if there is sufficient quality and quantity of speech.

“We had one lady who was on the ITV game show Tipping Point and we were able to get the recording and use that,” says Alice.

Providing the output

SpeakUnique’s own text-to-speech app allows the user to communicate via a keyboard if they retain motor function and this can be used up to the point that hand movement is lost. Users then must rely on eye movement or gaze methods, head switches or anything more advanced to output voices.

“We work with an incredible group of companies that are improving input speed using predictive text,” says Alice. “One company in Bristol called Smartbox helps people with a huge range of communication needs and they can work with almost any level of movement. They even use straws and if you can blow you can use that to select words.”

Brian Murphy is certainly a supporter of such advances both as a doctor and a patient.

“It’s about trying to retain some semblance of normality and some semblance of my own personality. It’s easier for people to communicate with me using my own voice.”

Links

This page was correct at the time of publication. Any guidance is intended as general guidance for members only. If you are a member and need specific advice relating to your own circumstances, please contact one of our advisers.

Insight - Secondary is published quarterly and distributed to MDDUS members throughout the UK who work in secondary care. It provides a mix of articles on risk, medico-legal and regulatory matters as well as general features and profiles of interest to our members.

In this issue

Voice in the bank

Capturing speech

A father’s voice

Deep neural network

Repairing or building a voice

Providing the output

Links

Read more from this issue of Insight Secondary

Related Content

Coroner's inquests

Medico-legal principles

Medicolegal and regulatory considerations when treating children

Save this article

Share this article

Voice in the bank

Capturing speech

A father’s voice

Deep neural network

Repairing or building a voice

Providing the output

Links

Related Content

Coroner's inquests

Medico-legal principles

Medicolegal and regulatory considerations when treating children

Save this article

Share this article

You must login to access this restricted content.

Not a member?