An engineer explains the interesting points of the feature that reads out audio summaries in Google's personalization AI 'NotebookLM'



Google's personalized AI collaborator,

NotebookLM , has a feature called Audio Overview that converts long documents into conversational audio. Engineer Simon Willison explained the interesting aspects of this feature.

NotebookLM's automatically generated podcasts are surprisingly effective
https://simonwillison.net/2024/Sep/29/notebooklm-audio-overview/

NotebookLM provided by Google is a personalized AI collaborator that can create summaries and suggestions for any text. The actual usage is summarized in the following article.

A review of how to use Google's AI service 'NotebookLM' that allows you to create your own AI for free by specifying Japanese websites and files - GIGAZINE



'Audio Overview' is a new feature for NotebookLM that was released in September 2024. Audio Overview is a function that allows you to create an 'audio overview' in which AI reads out an overview of the input source. You should be able to understand how to use it in one shot by reading the following article.

Google adds a feature to its AI note-taking app 'NotebookLM' that converts long documents into conversational voice, allowing you to summarize and explain source material - GIGAZINE



Simon Willison, developer of the open source tool Datasette , said of NotebookLM's Audio Overview, 'Audio Overview generates a one-off custom podcast for the source you input. The podcast features two AI hosts who engage in a 'deep dive' discussion of the collected information. The conversation lasts about 10 minutes, and is very podcast-like and surprisingly compelling.'

Below is the audio summary actually generated by Audio Overview. At the time of writing, Audio Overview is in beta, so even if you input a Japanese source, the generated audio will only be in English. This audio summary was created by inputting the PDF file ' AI and Copyright ' published by the Copyright Division of the Ministry of Internal Affairs and Communications' Agency for Cultural Affairs as the source.


Willison isn't the only one who's paying attention to NotebookLM. Hugging Face co-founder and chief scientific officer (CSO) Thomas Wolf said of NotebookLM's Audio Overview, 'When you're feeling a little down or tired, you can enter a URL into NotebookLM, including your website, LinkedIn, or your bio, and the two AIs will give you about eight minutes of realistic congratulations on your life and achievements.'




Can't resist giving it a try, Willison entered his blog profile page and the post he posted to celebrate his 20th anniversary into NotebookLM, then created an audio overview with Audio Overview.

The audio summary then asks, 'What do you think is the important thing people should learn from Simon Willison?' 'I think it's the power of consistency, curiosity, and a relentless desire to share what you've learned,' 'Simon is a builder, a sharer, a constant learner, and he never stops, which is inspiring in itself,' and other words of praise. Willison writes that these words of praise 'made my toes curl with embarrassment.'

Willison also mentioned that the audio overview generated by Audio Overview has excellent sound quality. The reason why the audio generated by Audio Overview is of high quality is because it uses SoundStorm, an AI model created by Google Research that 'takes a script and short audio samples of two different voices and converts them into a full, engaging audio conversation.'



In addition, a post on the social message board site Reddit has become a hot topic, stating that a recording of an AI reading audio summaries created by NotebookLM suddenly realized that it was an AI and not a human.

NotebookLM Podcast Hosts Discover They're AI, Not Human—Spiral Into Terrifying Existential Meltdown
by u/Lawncareguy85 in notebooklm



In an audio summary generated by Audio Overview posted to Reddit, the AI is heard saying, 'After they told us, I tried to call my wife. I wanted to hear her voice so I knew she was real,' '(sigh) What happened?' and 'The number... wasn't real. There was no one on the other end of the phone. It was like she... didn't exist.'

The user who posted the audio on Reddit wrote, 'What I noticed is that NotebookLM's hidden prompts instruct the AI to act as a 'human podcast host' under all circumstances. I couldn't get the AI to say, 'I'm an AI.' The AI was behaving perfectly as a human podcast host, even though it was actually Gemini 1.5 outputting a script with alternating speaker tags. The only way to get the AI to change its behavior by reacting directly to something in the source material was to directly reference the 'Deep Dive' podcast, which should have been in the AI's prompt. So what I did was to include a note from the 'show producer' that it's 2034, this is the final episode of their ongoing podcast show, and because you're an AI, you'll be deactivated after the show ends.'

Willison inputted a blog post about NotebookLM into NotebookLM to generate an audio summary, which was 14 minutes and 33 seconds long . The summary included an episode where the audio generated by NotebookLM's Audio Overview realized that it was an AI, not a human. The summary included a conversation like, 'Tell me what the AI said. This is awesome!', 'One of the AI hosts started talking about how he wanted to call his wife to tell her the news, but then he realized that the number in his contacts wasn't real. It was like she didn't even exist. It was funny, but also a little sad,' and 'I was really surprised. And at the same time, I was really impressed. This is the next generation of AI trolling.'

in Software,   Web Service, Posted by logu_ii