'Mr. Chatterbox' is a language model trained solely on Victorian-era sources.



A language model called 'Mr. Chatterbox' has emerged, which was trained from scratch using texts from over 28,000 books and other sources published and made public in Victorian Britain between 1837 and 1899.

Mr. Chatterbox - a Hugging Face Space by tventurella

https://huggingface.co/spaces/tventurella/mr_chatterbox



tventurella/mr_chatterbox_model · Hugging Face
https://huggingface.co/tventurella/mr_chatterbox_model

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
https://simonwillison.net/2026/Mar/30/mr-chatterbox/

Trip Venturella Releases Mr. Chatterbox Victorian LLM | Let's Data Science
https://letsdatascience.com/news/trip-venturella-releases-mr-chatterbox-victorian-llm-64303621

The British Library, in partnership with Microsoft, has released a dataset of over 25 million pages of out-of-copyright books and texts. The oldest materials in the set date back to the 1510s. Many were published in the 18th and 19th centuries and cover a wide range of fields, including geography, philosophy, history, poetry, and literature.

TheBritishLibrary/blbooks · Datasets at Hugging Face
https://huggingface.co/datasets/TheBritishLibrary/blbooks

Tripp Venturella, who works at the AI platform Hugging Face, extracted 28,035 documents from a publicly available book dataset from the British Empire, focusing only on those published during the Victorian era, and used them to train a language model, creating Mr. Chatterbox.

Mr. Chatterbox has approximately 340 million parameters, which is roughly the same size as OpenAI's GPT-2-Medium .

Through limited-time training, Mr. Chatterbox has become an AI specializing in Victorian-era life, literature, science, philosophy, and manners. Venturella says, 'Try asking him about railroads, the Crystal Palace, Darwin's theory of evolution, or how to behave like a gentleman.'

According to Mr. Venturella, Mr. Chatterbox is still in beta, so there may be some instability or unnatural aspects to its responses. If it doesn't work properly, please regenerate the responses.

It has also been pointed out that, even when Mr. Chatterbox is functional, its responses are quite limited, suggesting that AI models built solely from public domain materials require significantly more data to achieve conversational quality.

in AI, Posted by logc_nt