talkie: An AI model trained on pre-1931 data

Wednesday, 29 April 2026 at 07:00

talkie Een AI-model getraind op data van vóór 1931

In April 2026, researchers led by Nick Levine unveiled a striking new AI model: talkie-1930, a language model trained exclusively on texts from before 1931. The project, which also involved David Duvenaud and Alec Radford, shows how artificial intelligence behaves without modern knowledge or internet data.

The model offers a rare window into AI development and immediately raises questions about data, bias, and the future of artificial intelligence.

What is talkie-1930, and why does it matter?

Talkie-1930 is a “vintage language model” that uses only historical texts. It was trained on 260 billion tokens from books, newspapers, and documents published before 1931, leaving it unaware of modern events or technologies.

That approach makes the model fundamentally different from today’s AI systems. While modern models lean on internet-scale data, Talkie shows how AI operates without that influence. It’s a valuable experiment for researchers seeking to understand how data shapes AI outputs.

What makes this model technically notable?

With 13 billion parameters, Talkie is the largest model of its kind. Architecturally it resembles modern systems, but its training data is entirely different.

Key technical takeaways:

It underperforms on factual knowledge compared to modern AI
The gap halves when “anachronistic” questions are removed
It shows surprisingly strong language skills despite limited data
It can handle simple programming tasks via examples

These results suggest language competence can be decoupled from up-to-date knowledge. That matters for AI development in domains where reliability and control are critical, such as government and education.

Why are ‘vintage’ AI models compelling?

Vintage models offer a controlled testbed for AI research. Because they exclude modern data, they are free from “data contamination,” the well-known issue where models regurgitate answers from their training sets.

This opens up new possibilities:

Cleaner evaluation of generalization
Insight into how AI “discovers” new knowledge
Comparisons across datasets and time periods

Researchers use Talkie to test whether a model can predict or reconstruct future inventions based on past knowledge—think of theories by Albert Einstein or early computer science concepts.

What are the limits and risks?

Talkie’s limitations are immediately visible. The model mirrors early 20th-century norms and values. That means:

Gender roles often appear traditional
Social inequality is implicitly normalized
Modern perspectives are entirely absent

Data quality also plays a major role. Historical texts are often digitized via OCR, which introduces errors and can reduce performance to roughly 30 percent of optimal.

These constraints underline how deeply AI depends on its training data.

What’s next for the project?

The team aims to scale Talkie quickly. They’re working on:

Larger datasets, potentially exceeding 1 trillion tokens
Better OCR technology for historical texts
Multilingual expansion
New evaluation methods for AI predictive capabilities

The end goal: a model at roughly GPT-3.5 level—built entirely on historical data.

Conclusion: Looking back to push AI forward

Talkie-1930 proves that AI progress isn’t just about more data—it’s about different data. By mining the past, researchers gain sharper insight into how language models work, where bias emerges, and how AI evolves.

For the Netherlands, it’s a chance to assess AI more critically and strategically—not just what the technology can do, but what it learns from the world we feed it.

byRobin Heester

History Research

ChatGPT 'Solved' a 60-Year Erdős Problem? Here’s What Actually Happened

Elon Musk accuses 'Scam Altman' of looting OpenAI

Write a comment