HN Reader

History LLMs: Models trained exclusively on pre-1913 texts

897

421

“Time-locked models don't roleplay; they embody their training data. Ranke-4B-1913 doesn't know about WWI because WWI hasn't happened in its textual universe. It can be surprised by your questions in ways modern LLMs cannot.”

“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”

This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.

1 month agoby saaaaaam

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire. Not just survey them with preset questions, but engage in open-ended dialogue, probe their assumptions, and explore the boundaries of thought in that moment.

Hell yeah, sold, let’s go…

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Oh. By “imagine you could interview…” they didn’t mean me.

1 month agoby seizethecheese

It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.

The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.

1 month agoby anotherpaulg

>Historical texts contain racism, antisemitism, misogyny, imperialist views. The models will reproduce these views because they're in the training data. This isn't a flaw, but a crucial feature—understanding how such views were articulated and normalized is crucial to understanding how they took hold.

Yes!

>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Noooooo!

So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?

1 month agoby bondarchuk

I wonder if you could query some of the ideas of Frege, Peano, Russell and see if it could through questioning get to some of the ideas of Goedel, Church and Turing - and get it to "vibe code" or more like "vibe math" some program in lambda calculus or something.

Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.

1 month agoby derrida

The sample responses given are fascinating. It seems more difficult than normal to even tell that they were generated by an LLM, since most of us (terminally online) people have been training our brains' AI-generated text detection on output from models trained with a recent cutoff date. Some of the sample responses seem so unlike anything an LLM would say, obviously due to its apparent beliefs on certain concepts, though also perhaps less obviously due to its word choice and sentence structure making the responses feel slightly 'old-fashioned'.

1 month agoby Heliodex

On what data is it trained?

On one hand it says it's trained on,

> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.

Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.

But they seem to say it represents the 1913 viewpoint:

On one hand, they say it represents the perspective of 1913; for example,

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.

People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?

1 month agoby mmooss

I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?

  We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).

So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.

1 month agoby andy99

I'm surprised you can do this with a relatively modest corpus of text (compared to the petabytes you can vacuum up from modern books, Wikipedia, and random websites). But if it works, that's actually fantastic, because it lets you answer some interesting questions about LLMs being able to make new discoveries or transcend the training set in other ways. Forget relativity: can an LLM trained on this data notice any inconsistencies in its scientific knowledge, devise experiments that challenge them, and then interpret the results? Can it intuit about the halting problem? Theorize about the structure of the atom?...

Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.

1 month agoby nospice

Wait so what does the model think that it is? If it doesn't know computers exist yet, I mean, and you ask it how it works, what does it say?

1 month agoby frahs

Isn’t there obvious problems baked into this approach, if this is used for anything but fun? LLM’s lie and fake facts all the time, they are also masters at enforcing the users bias, even unconscious ones. How even a professor of history could ensure that the generated text is actually based on the training material and representative of the feelings and opinions of the given time period, not enforcing his biases toward popular topics of the day?

You can’t, it is impossible. That will always be an issue as long as this models are black boxes and trained the way they are. So maybe you can use this for role playing, but I wouldn’t trust a word it says.

1 month agoby delis-thumbs-7e

So many disclaimers about bias. I wonder how far back you have to go before the bias isn’t an issue. Not because it unbiased, but because we don’t recognize or care about the biases present.

1 month agoby briandw

This is a neat idea. I've been wondering for a while now about using these kinds of models to compare architectures.

I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.

1 month agoby Teever

I can imagine the political and judicial battles already, like with textualist feeling that the constitution should be understood as the text and only the text, meant by specific words and legal formulations of their known meaning at the time.

“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”

1 month agoby ineedasername

Interesting ... I'd love to find one that had a cutoff date around 1980.

1 month agoby nineteen999

Unfortunately there isn't much information on what texts they're actually training this on; how Anglocentric is the dataset? Does it include the Encyclopedia Britannica 9th Edition? What about the 11th? Are Greek and Latin classics in the data? What about Germain, French, Italian (etc. etc.) periodicals, correspondence, and books?

Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.

Still, I'm extremely excited to see this project come to fruition!

1 month agoby doctor_blood

I would like to see what their process for safety alignment and guardrails is with that model. They give some spicy examples on github, but the responses are tepid and a lot more diplomatic than I would expect.

Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.

Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.

1 month agoby tonymet

I hereby declare that ANYTHING other than the mainstream tools (GPT, Claude, ...) is an incredibly interesting and legit use of LLMs.

1 month agoby monegator

> Why not just prompt GPT-5 to "roleplay" 1913?

Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.

It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.

The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.

Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.

Human minds don't have a time stamp on everything they know, either. If I ask someone, "talk to me using nothing but the vocabulary you knew on your fifteenth birthday", they couldn't do it. Either they would comply by using some ridiculously conservative vocabulary of words that a five-year-old would know, or else they will accidentally use words they didn't in fact know at fifteen. For some words you know where you got them from by association with learning events. Others, you don't remember; they are not attached to a time.

Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.

> GPT-5 knows how the story ends

No, it doesn't. It has no concept of story. GPT-5 is built on texts which contain the story ending, and GPT-5 cannot refrain from predicting tokens across those texts due to their imprint in its weights. That's all there is to it.

The LLM doesn't know an ass from a hole in the ground. If there are texts which discuss and distinguish asses from holes in the ground, it can write similar texts, which look like the work of someone learned in the area of asses and holes in the ground. Writing similar texts is not knowing and understanding.

1 month agoby kazinator

I had considered this task infeasible, due to a relative lack of training data. After all, isn't the received wisdom that you must shove every scrap of Common Crawl into your pre-training or you're doing it wrong? ;)

But reading the outputs here, it would appear that quality has won out over quantity after all!

1 month agoby andai

Once I had an interesting interaction with llama 3.1, where I pretended to be someone from like 100 years in the future, claiming it was part of a "historical research initiative conducted by Quantum (formerly Meta), aimed at documenting how early intelligent systems perceived humanity and its future." It became really interested, asking about how humanity had evolved and things like that. Then I kept playing along with different answers, from apocalyptic scenarios to others where AI gained consciousness and humans and machines have equal rights. It was fascinating to observe its reaction to each scenario

1 month agoby flux3125

I'd love to see the LLM trained on 1600s-1800s texts that would use the old English, and especially Polish which I am interested in.

Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)

I guess there is not so much text from that time though...

1 month agoby p0w3n3d

Two years ago I trained an AI on American history documents that could do this while speaking as one of the signers of the Declaration of Independence. People just bitched at me because they didn't want to hear about AI.

1 month agoby TheServitor

Awesome. Can't wait to try and ask it to predict the 20th century based on said events. Model size is small, which is great as I can run it anywhere, but at the same time reasoning might not be great.

1 month agoby Departed7405

This idea sounds somewhat flawed to me based on the large amount of evidence that LLMs need huge amounts of data to properly converge during their training.

There is just not enough available material from previous decades to trust that the LLM will learn to relatively the same degree.

Think about it this way, a human in the early 1900s and today are pretty much the same but just in different environments with different information.

An LLM trained on 1/1000 the amount of data is just at a fundamentally different stage of convergence.

1 month agoby 3vidence

I'd love for Netflix or other streaming movie and series services to provide chat bots that you could ask questions about characters and plot points up to where you have watched.

Provide it with the closed captions and other timestamped data like scenes and character summaries (all that is currently known but no more) up to the current time, and it won't reveal any spoilers, just fill you in on what you didn't pick up or remember.

1 month agoby DonHopkins

I would love to see this LLM try to solve math olympiad questions. I’ve been surprised by how well current LLMs perform on them, and usually explain that surprise away by assuming the questions and details about their answers are in the training set. It would be cool to see if the general approach to LLMs is capable of solving truly novel (novel to them) problems.

1 month agoby bobro

This reminded me of some earlier discussion on Hacker News about using LLMs trained on old texts to determine novelty and obviousness of a patent application: https://news.ycombinator.com/item?id=43440273

1 month agoby btrettel

Why not use these as a benchmark for LLM ability to make breakthrough discoveries?

For example prompt the 1913 model to try and “Invent a new theory of gravity that doesn’t conflict with special relativity”

Would it be able to eventually get to GR? If not, could finding out why not illuminate important weaknesses.

1 month agoby WhitneyLand

Love the concept- can help understanding the overton window on many issues. I wish there were models by decades - up to 1900, up to 1910, up to 1920 and so on- then ask the same questions. It'd be interesting to see when homosexuality or women candidates be accepted by an LLM.

1 month agoby dwa3592

[flagged]

1 month agoby alexgotoi

This would be a super interesting research/teaching tool coupled with a vision model for historians. My wife is a history professor who works with scans of 18th century english documents and I think (maybe a small) part of why the transcription on even the best models is off in weird ways, is it seems to often smooth over things and you end up with modern words and strange mistakes, I wonder if bounding the vision to a period specific model would result in better transcription? Querying against the historical document you're working on with a period specific chatbot would be fascinating.

Also wonder if I'm responsible enough to have access to such a model...

1 month agoby neom

1 month agoby sbmthakur

Datomic has a "time travel" feature where for every query you can include a datetime, and it will only use facts from the db as of that moment. I have a guess that to get the equivalent from an LLM you would have to train it on the data from each moment you want to travel to, which this project seems to be doing. But I hope I'm wrong.

It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.

1 month agoby delichon

> [They aren't] perfect mirrors of "public opinion" (they represent published text, which skews educated and toward dominant viewpoints)

Really good point that I don't think I would've considered on my own. Easy to take for granted how easy it is to share information (for better or worse) now, but pre-1913 there were far more structural and societal barriers to doing the same.

1 month agoby underfox

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

I don't mind the experimentation. I'm curious about where someone has found an application of it.

What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.

1 month agoby mmooss

While obvious, it’s still interesting that its morals and values seem to derive from the texts it has ingested. Does that mean modern LLMs cannot challenge us beyond mere facts? Or does it just mean that this small model is not smart enough to escape the bias of its training data? Would it not be amazing if LLMs could challenge us on our core beliefs?

1 month agoby thesumofall

Keep at it Zurich!

1 month agoby Tom1380

for anyone moaning the plight that it's not accessible to you: they are historians, I think they're more educated in matters of historical mistake than you or me. playing safe is simply prudence. it is sorely lacking in the American approach to technology. prevention is the best medicine.

1 month agoby ulbu

It would be interesting to have LLMs trained purely on one language (with the ability to translate their input/output appropriately from/to a language that the reader understands). I can see that being rather revealing about cultural differences that are mostly kept hidden behind the language barriers.

1 month agoby Myrmornis

This is a brilliant idea. We have lots of erroneous ideas about the views and thoughts people had in the past. This will show we are still, actually, largely similar. Hopefully more and more of these historical LLMs appear.

1 month agoby Sprotch

Excuse me if it's obvious, but how could I run this? I have run local LLMs before, but only have very minimal experience using ollama run and that's about it. This seems very interesting so I'd like to try it.

1 month agoby elestor

Fascinating llm use case I never really thought about til now. I’d love to converse with different eras and also do gap analysis with present time - what modern advances could have come earlier, happened differently etc.

1 month agoby shireboy

I'd be very surprised if this is clean of post-1913 text. Overall I'm very interested in talking to this thing and seeing how much difference writing in a modern style vs and older one makes to it's responses.

1 month agoby casey2

> Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu. This knowledge inevitably shapes responses, even when instructed to "forget.

> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.

I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time

1 month agoby Agraillo

This is so cool. Props for doing the work to actually build the dataset and make it somewhat usable.

I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems

1 month agoby tedtimbrell

I wouldn't have expected there to be enough text from before 1913 to properly train a model, it seemed like they needed an internet of text to train the first successful LLMs?

1 month agoby arikrak

I've always like the idea of retiring to the 19th century.

Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do

1 month agoby awesomeusername

Everyone learns that the renaissance was sparked by the translation of Ancient Greek works.

But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.

I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.

We want to make ancient texts accessible to people and AI.

If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org

1 month agoby dr_dshiv

> trained from scratch on 80B tokens of historical data

How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?

1 month agoby moffkalast

It sounds like a fascinating idea, but I'd be curious if prompting a more well-known foundational model to limit itself to 1913 and early be similar.

1 month agoby why-o-why

So, could this be an example of an LLM trained fully on public domain copyright-expired data? Or is this not intended to be the case.

1 month agoby Muskwalker

I assume this is a collaboration between the History Channel and Pornhub.

“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”

1 month agoby satisfice

hi, can I have latin only LLM? It can be latin plus translations (source and destination).

May be too small a corpus, but I would like that very much anyhow

1 month agoby TZubiri

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.

1 month agoby jimmy76615

This would actually be a wonderful way to learn physics, before GR and quantum mechanics

1 month agoby smugtrain

Can't wait for all the syncopated "Thou dost well to question that" responses!

1 month agoby davidpfarrell

How does it do on Python coding? Not 100% troll, cross domain coherence is a thing.

1 month agoby PeterStuer

How can we interact with such models? Is there a web application interface?

1 month agoby dkalola

i feel like this would be super useful for unique marketing copy and writing. The responses sound so sophisticated like I read it in my grandfather's tone and cadence.

1 month agoby Aeroi

A question for those who think LLM’s are the path to artificial intelligence: if a large language model trained on pre-1913 data is a window into the past, how is a large language model trained on pre-2025 data not effectively the same thing?

1 month agoby joeycastillo

smbc did a comic about this: http://smbc-comics.com/comic/copyright The punchline is that the moral and ethical norms of pre-1913 texts are not exactly compatible with modern norms.

1 month agoby superkuh

Very neat! I've thought about this with frontier models because they're ignorant of recent events, though it's too bad old frontier models just kind of disappear into the aether when a company moves on to the next iteration. Every company's frontier model today is a time capsule for the future. There should probably be some kind of preservation attempts made early so they don't wind up simply deleted; once we're in Internet time, sifting through the data to ensure scrapes are accurately dated becomes a nightmare unless you're doing your own regular Internet scrapes over a long time.

It would be nice to go back substantially further, though it's not too far back that the commoner becomes voiceless in history and we just get a bunch of politics and academia. Great job; look forward to testing it out.

1 month agoby kldg

You think Albert is going to stay in Zurich or emigrate?

1 month agoby lifestyleguru

I would love to see this done, by year.

"Give me an LLM from 1928."

etc.

1 month agoby erichocean

Ontologically, this historical model understands the categories of "Man" and "Woman" just as well as a modern model does. The difference lies entirely in the attributes attached to those categories. The sexism is a faithful map of that era's statistical distribution.

You could RAG-feed this model the facts of WWII, and it would technically "know" about Hitler. But it wouldn't share the modern sentiment or gravity. In its latent space, the vector for "Hitler" has no semantic proximity to "Evil".

1 month agoby mleroy

That Adolf Hitler seems to be a hallucination. There's totally nothing googlable about him. Also what could be the language his works were translated from, into German?

1 month agoby anovikov

The knowledge machine question is fascinating ("Imagine you had access to a machine embodying all the collective knowledge of your ancestors. What would you ask it?") – it truly does not know about computers, has no concept of its own substrate. But a knowledge machine is still comprehensible to it.

It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.

1 month agoby ianbicking

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

oh COME ON... "AI safety" is getting out of hand.

1 month agoby usernamed7

Why does history end in 1913?

1 month agoby zkmon

Research credits from lambda "ai" huh, where's your funding coming from this again? All to provide inaccurate slop to unwitting students, you should be ashamed of yourselves.

1 month agoby diamond559

wow amazing idea

1 month agoby holyknight

ffs, to find out what figures from the past thought and how they felt about the world, maybe we read some of their books, we will get the context. Don't prompt or train LLM to do it and consider it the hottest thing since MCP. Besides, what's the point? To teach younger generations a made up perspective of historic figures? Who guarantees the correctness/factuality? We will have students chatting with made up Hitler justifying his actions. So much AI slop everywhere.

1 month agoby r0x0r007