Lance Eaton, director of faculty development and innovation at College Unbound, is a veteran of my “3 Questions” series. When I saw Lance’s tweet (below) about my piece titled “Should University Presses Use AI Narration for Absent Audiobooks?” I wanted to hear “all the thoughts.” Lance graciously agreed.
Q: Help us understand the academic and professional context you bring to discussing AI-narrated university press books. What do you bring to this conversation?
A: It’s strange to think that this is a niche of mine, but I think it is.
In the past 30 years, I’ve listened to about 3,000 audiobooks. For over 20 years, I professionally reviewed them for publications such as Library Journal, Publishers Weekly and AudioFile Magazine, among others, and served as a judge for the Audies Award (the Audio Publishers Association’s industry awards) since at least 2004. I’ve blogged about audiobooks over the years and have even done a lightning talk on how fascinating audiobooks are.
I’ve even done some academic writing on audiobooks that explored how concepts of adaptation and translation might change our thinking about audiobook narration. Other work of mine focuses on copyright, knowledge production and access to knowledge–a key part of my (soon-to-be-done?) dissertation, which focuses on academic piracy. I’ve also engaged with questions of adapting across media and the possibilities and restrictions in the Liberal Arts Lecture I delivered at North Shore Community College in 2018, “Vampires Get You Famous, but the Hulk Will Get You Sued: The Intersections of Creativity, Censorship, Copyright and the Commons.”
This discussion of moving from one medium to another introduces questions about what that transition means, the copyright challenges, how technology changes usage, experience and expectations, along with other considerations. After all, part of how many audiobook publishers got their start was by adapting works that were in the public domain, so they only had to think about the copyright of the performance and not the text.
In terms of a background in artificial intelligence, I have been involved in a lot of conversations, workshops, talks and writing around the impact of generative AI in higher education over the last year. The crux of those conversations were also centered on this issue of where, when and why to use these tools and what possible new issues or concerns might arise if we use them—or even don’t use them.
Q: OK, thank you. So what do you think? Should university presses be pursuing a strategy of AI-narrated audiobooks for titles where the economics do not seem to enable a human-narrated version?
A: Personally, I wonder if we really need more content. My “to listen” pile is easily hundreds of audiobooks that already exist. Even if more were available, it would just make my list longer. It forces me to listen to other things as opposed to everything I want to listen to right now that isn’t available. I’m OK with that.
There is an abundance of academic audiobooks and work being done by publishers that’s worth recognizing. University Press Audiobooks has thousands of titles and continues to grow. Cambridge University Press, Oxford University Press, Princeton Audio, University of Chicago Press and Yale Press have also created their own lines of audiobooks as well. Some of the mainstream audiobook publishers like Tantor Audio have many academic-oriented books, including works by Frantz Fanon, Michel Foucault, John Dewey and Jack Halberstam, among others, that I’ve certainly benefited from listening to.
Unfortunately, there is no easy way to sift through and find them all. This streaming-ification of digital audiobooks leaves us with too many platforms, too much to choose from and the challenge to find the specific things we want. We’re all sifting through the catalog bins in the cloud to find the academic books (though we need a clearer definition of “academic books”) buried in different categories that don’t always make sense. We end up with a deeply researched and argued academic text like Robin D. G. Kelley’s Race Rebels: Culture, Politics and the Black Working Class in the same category listing as whoever Bill O’Reilly is Killing next.
I do hold that romantic notion that we have built up around audiobook narrators and their unique skills to tell moving stories. It’s both true and part of a resistance to which we cling. It routinely arises during technological storytelling transitions such as how televisual material could never be as moving as a written text or that listening to an audiobook could never be the same as reading a text. While I feel that deeply in my gut (or ears?), I also know it’s a human tendency and one that ignores [that] audiobooks only exist because of previous technological shifts.
Reluctantly, I do believe publishers will explore this if they are not doing so already. However, it will be complicated. Publishers that are strictly audiobook-oriented will need to attain the rights to produce these audiobooks and also pay for the AI tools. And a question remains if the AI-generated voice audiobook can be copyrighted.
Typically, audiobooks have two copyrights if you listen all the way to the end. The publisher or author maintains the copyright for the text, and the audiobook publisher often maintains the copyright for the audio narration. It remains to be seen if the audio narration by a generative AI can be copyrighted. Currently, AI-generated text and images seem unlikely to be copyrighted, and the same legal precedent might extend to AI-generated sound. If that’s the case, then publishers will be reluctant to give over rights to audiobook publishers to produce something that can’t be copyrighted and maximize the profits. And further down the line, platforms like Audible will also be suspect of anything they cannot have full control over (see Cory Doctorow’s work in trying to put his DRM-free audiobooks on Audible).
For publishers that have their own in-house audio production, there still remain the investment costs of an AI generative tool to produce audiobooks and other roles still needed. For instance, someone is still going to need to listen to the final product to review—especially academic texts that cover nuanced topics (sexual violence, genocide, bigotry, etc.), to make sure the AI appropriately captures the tone in such spaces. And the publishers, too, will have concerns about whether the production is covered by copyright.
Publishers might pursue generative AI but may also find that economics are not actually in their favor.
Q: If utilizing AI to create audiobooks is a bad idea, given that this practice may inevitably lead to far fewer human-narrated books (and far less work for voice actors), then what might be the alternative? Is there a way to simultaneously encourage investments in human-read audiobooks while finding some way to create AI-narrated versions for university press books that will never get an audiobook version?
A: I’m not sure it’s a bad idea, but it continues to raise the question of where the line is (or can there be a line). If academic publishers give over to using generative AI voices for cost and efficiency reasons, then can they justify not letting scholars do the same thing in their work? From my dissertation research, it’s evident that scholars are engaging in academic piracy in large part because of efficiency and the demands of productivity in academia. Where else within the entire academia enterprise (intentionally using that word, with all its business-oriented implications) will expedience and cost trump human diligence and deliberation? And the more we use it, the harder it will become to not expect or anticipate everyone to use generative AI—something that has different concerns and implications for academia and society as a whole.
Yet one model that comes to mind that bypasses generative AI entirely is the Librivox model. There are more than 10,000 audiobooks—all done by volunteers over the last 18 years. While the quality varies, there are thousands of audiobooks now that exist for folks that didn’t before. What if individual academic publishers or a consortium of them looked to volunteers to build out their audiobook libraries? Many folks would be interested in this beyond volunteers who already do it. Authors and scholars are interested and invested in getting work out there in different media and would likely also participate in this.
Within that, maybe there is a revenue-sharing model that emerges. After all, already Audible also does this. Publishers could also build an incentive structure for books that are more in demand, either evident through citations or through request buttons on product pages, much the same that Amazon used to do.
The result would be access to a larger range of imperfect audiobooks that included people as part of the process. After all, even if we get generative AI audiobooks, they’re going to be far from perfect as well and still potentially costly. So, is generative AI the actual fix or the shiny thing distracting us from a model that publishers could have been using for years?