The first audiobook review of my non-fiction book last year complimented narrator Matt Jamie on his confident British accent and engaging delivery, and I wanted to shout, “I wrote the words!” Will I consider an AI next?
As panels at FutureBook discussed the future of audiobooks, with the occasional polite mention of “artificial voice,” at lunchtime Google Play Books demonstrated its text-to-speech (TTS) technology to a half-empty audience. which allows publishers to create – quality audiobooks at low cost.
From TikTok to Young scientist, the text for the increasingly persuasive speech is everywhere – except in print. This silence tells me that, despite five years of talking to AI startups, publishing is not an industry ready to face the inevitable—how the artificial voice will revolutionize audiobooks—nor the ethical dilemmas. that will present, such as like using the voice of a beloved deceased actor to narrate an audiobook.
Naturally, publishers will talk about consumer demand for human broadcasters. However, despite publishers’ caution, the case for using AI to produce audiobooks is indisputable and will only get stronger. I click play and am amazed at the quality of the artificial speech samples I’m hearing. My 12-year-old son listens and simply says, the audiobooks will be free.
The argument that a human streamer is inherently better is flawed and subjective – I’ve hit ban many times because I thought an AI could do a better job.
For more than 200 years scientists have tried to generate human speech through mechanical means by imitating various organs used by humans to produce speech, such as a bellows for the lungs and a tube for the vocal tract. Now computer models such as WaveNet, DeepMind and Tacotron have achieved that in practice using technologies such as neural networks that mimic the way the human brain works to continuously improve speed and accuracy.
Startups use these models as a foundation to build their own apps, each with their own unique selling point. For UK-based DeepZen, it’s about capturing emotions, and is credited with producing the first AI-generated audiobook in 2021, the 350-page psychological thriller She chose me by Tracey Emerson.
The argument that a human streamer is inherently better is flawed and subjective – I’ve hit ban many times because I thought an AI could do a better job. Artificial voice applications use human voices for learning and use cloned copies of the human voice to provide realistic tone and emotion.
Producing a traditional audiobook is an expensive and time-consuming business. Everyone needs roles including actor, editor and proofreader; a ten-hour audiobook can take 60 hours to complete, over several months. The use of artificial voice can reduce the cost of production from approximately $2,500 for a standard-length work of fiction to $400 and reduce the time required to several days.
“It’s a no-brainer,” says DeepZen co-founder and CEO Taylan Kamis. There are, he tells me, over 100 million books in print in the world and 20-25 million e-books, but only half a million audio books, 90% of which are in English and half of which were produced in four to five years. last. years. The cost and time it can take to produce an audiobook is the “major block”, especially when it comes to non-English markets. For a publisher, the opportunity is huge.
Then there’s the democratization that artificial voice can bring. Small, independent publishers whose books usually cannot be converted to audiobooks can now consider using artificial voice. Similarly, publishers of books in minority languages will be able to create new audio books for their communities. TTS can give all those new books that are not licensed for audio due to production cost, neglected backlists, dry academic texts and books in minority languages a chance to find a voice, literally.
Audiobooks have gone from an afterthought in the contract negotiation to a format considered at the time of purchase.
Yes, artificial voices, many are never good enough in every language, but they are getting better and better in their accent, rhythm and intonation. “Other times there’s one that blows my mind,” says Nathan Hull, chief strategy officer, Beat Technology AS.
There may be a clash between publishers, who may be as committed to producing an audiobook and the human voice as a printed book, and the “tech bros” who want to solve the “problem” of all those unregistered books. . However, there will be room for both. The market in the future is likely to resemble a pyramid, with the bulk of texts produced cheaply by TTS, and the rest by high-quality human voice productions.
For now, artificial voice may work best in business or academic texts, with buyers of those books tending to buy them for information rather than narrative. That doesn’t mean artificial voice won’t work well for fiction in the end.
One issue will be distribution. The world’s largest audiobook distributor and creator, Audible, currently only distributes human-streamed audiobooks, but that’s about to change. “As publishers, we have a responsibility to our authors,” Jon Watt, director of audio, Bonnier Books UK, tells me – and for now that means human broadcasters.
The audiobook market – and publishers’ attitudes towards them – has changed radically over the past five years. Audiobooks have gone from an afterthought in contract negotiation to a format considered for purchase, and Spotify is now delivering audiobooks to millions of users who have never heard them before.
Publishers’ attitudes will continue to evolve as technology does and Generation Z will grow up, and over the next five years the growing demand for audio content, the cost of audiobook production with artificial voice and the demand for democratization will break the dam. TTS can even create an entirely new art form.
I tracked down Matt Jamie, the human narrator of my book, to ask what he thinks about the artificial voice. He tells me he’s thought about licensing his voice, and that at least one agency has thought about creating a special artificial voice section, but he’s not losing sleep right now. Maybe he should.