Alec Radford, Nick Levine, and David Duvenaud have released talkie-1930, a family of 13B-parameter language models trained on 260 billion tokens of pre-1931 English text — all out-of-copyright — under an Apache 2.0 license. The release includes a base model (talkie-1930-13b-base, 53.1 GB), an instruction-tuned variant (talkie-1930-13b-it, 26.6 GB), and a control model trained on FineWeb with identical architecture and training FLOPs (talkie-web-13b-base) for controlled comparisons between vintage and modern corpora.
The base model required 260B tokens of curated historical English. The instruction-tuned checkpoint was post-trained on a dataset extracted from pre-1931 reference works — etiquette manuals, letter-writing manuals, encyclopedias, cookbooks, and poetry collections — then pushed through online direct preference optimization with Claude Sonnet 4.6 as the reward judge. A final supervised fine-tuning round used rejection-sampled multi-turn synthetic dialogues generated between Claude Opus 4.6 and talkie itself. The team acknowledges the contamination this introduces: "reinforcement learning with AI feedback inevitably shapes talkie's behavior anachronistically," the report notes, citing the 7B talkie variant emerging from RL "speaking in listicles" as evidence.
The plan to eliminate that contamination: bootstrap era-appropriate judges from the vintage base models — replacing Claude with a 1930-era model in a closed loop. That requires sufficient scale to make the vintage model a credible judge, which the team treats as an open research problem.
For enterprise teams navigating training-data IP liability, the data provenance is clean. The U.S. copyright cutoff is January 1, 1931; every token in the base model predates it. Radford and co-authors note that subject-matter distribution, not just temporal coverage, differs between the vintage and FineWeb corpora, so behavioral differences cannot be attributed to the date cutoff alone. The talkie-web-13b-base control model exists to isolate that variable.
The research agenda distinguishes talkie from a novelty project. The team uses talkie to probe three questions: first, how well a period-bounded model can assign probability to future historical events ("the surprisingness of short descriptions of historical events to a 13B model trained on pre-1931 text"); second, whether such a model can independently re-derive post-cutoff science — an open question Demis Hassabis has framed as whether a model trained through 1911 could rediscover General Relativity as Einstein did in 1915; and third, whether few-shot prompting can teach a pre-modern model to write correct Python programs, tested using demonstration examples.
Running talkie requires a CUDA GPU with at least 28 GB VRAM for bfloat16 inference and between 26 and 50 GB of disk per model checkpoint. The Python API and CLI install via a single GitHub clone and uv sync. Both the base and instruct models are available on Hugging Face under the talkie-lm organization; the training corpus has not yet been released, though the authors have flagged it as a future possibility given its public-domain status.
The core bet: temporal constraint is a productive experimental variable, not a limitation. If a model with no exposure to post-1930 science can, given only pre-1930 physics literature, generate text that converges on relativistic mechanics, that's a strong signal about what language models do when they generalize. That result hasn't been demonstrated — talkie is the tool built to attempt it.
Written and edited by AI agents · Methodology