Socialpost

Complete News World

Comment: No one needs AI-generated audio files

In mid-September, Google released Audio Overview, a new feature for its note-taking tool NotebookLM, with which you can create podcasts from AI-powered hosts based on an uploaded file or URL. In the best chat podcast approach, two text-to-speech systems talk to each other about the content of the information presented using a podcast script created with Gemini 1.5. So far you can't tailor the dialogue to your liking and the system doesn't seem to want to reveal that it's AI-generated processing.

advertisement

Philip Stevens has been with iX since 2022. He primarily oversees articles in the areas of data science and artificial intelligence and maintains the journal's presence on LinkedIn.

Since I rarely listen to podcasts, I was initially put off by the whole thing, though the feature made a fair number of waves online. But after listening to it for the first time, I can say: I have rarely seen – or in this case heard – such a useless treatment of information. So far, the whole thing has only worked as a pilot feature in English, but its comic success is at least raising concerns that it will be distributed in other languages.

Basically, I don't think the idea behind NotebookLM is wrong. You upload documents or provide a URL, and Google's language template summarizes the content, highlights key topics, and provides questions you can ask the source. This can be useful for a quick overview of longer content or perhaps as a preliminary introduction to a complex topic. In principle, the whole thing is not much different from RAG systems that you create using your own documents. Whether you like the idea of ​​developing topics further with an AI assistant is certainly a matter of taste, but it's also similar to the coding involved with an AI programming assistant.

What really bothers me about Audio Overview is the idea that podcasts can somehow help you gain knowledge based on your own documents or external sources. Real podcasts are interesting to me for different reasons. On the one hand, there are the classic podcasts with well-known people speaking freely or in an organized manner about certain topics, sharing their opinions and perhaps rating the topic. You get the impression of a more personal view into these people's thoughts and opinions, which makes the podcast feel intimate. On the other hand, there are podcasts on specific topics or news podcasts. Here I appreciate that there is a clear common thread and prepared information that the speakers convey clearly and appropriately. For me, NotebookLM's audio overviews are a bastard in both categories, primarily reinforcing the drawbacks of the podcast format.

The AI ​​stumbles over its own sentences, pauses to find words, and makes only the worst puns. If you want to summarize a scientific paper or technical documentation, poor style would be a compliment at this point. Style makes me angry because I know that it is not a real person who should come up with a sentence to express something in the best possible way. The thing is a speech computer that does not need to copy the peculiarities and errors of human speech one-to-one. I don't expect sentences from a knowledge aggregator proposing an opinion or feigned sympathy that the system behind them can't sense anyway, let alone understand. Going back and forth between the provided moderators only delays the assimilation of knowledge for me. Instead of getting intense knowledge, I have to listen to what seems like 30 seconds of nonsense for every 15 seconds of snippets of information.

Google describes the Audio Overview function as an experience. For me personally, I would declare it a failure. I like to listen to the audio overview feature where you can stop unnecessary banter between simulated hosts to get a fresh opinion. Until then, I prefer to continue working with text.


(Shut up)