← Back to ml

What is Context Rot?

Learn about how LLMs go through documents

IntermediateLLMsContext Windows

The Experiment

Researchers ran a really interesting experiment, which we'll call finding a needle in a haystack!

Basically, they uploaded the ENTIRE Harry Potter franchise into the most popular LLMS, and asked it to find every single spell mentioned.

It took a little bit of time, but it did just as expected!

But that brings up a question, did it just use its training data, or did it actually read the book?

To figure this out, researchers embedded two custom spells into the book:

  1. "Fumbus"(a spell that makes the target float off the ground)
  2. "Driplo"(a spell that causes it to rain on a specific person)

Then, they asked the LLM the same thing, and wanted to see if it could find the two new spells!

It could not. This tell us that the models weren't neccesarily reading the text that we gave, but were instead going into their own training data.

The models know Harry Potter VERY well. A 2025 Stanford study showed how when you simply give them the opening sentence of chapter one, they were able to continue the story basically word for word.

Another Experiment

So, what if we give them a document they've never seen before? Well, researchers did exactly that.

They hid needles in different parts of the document, like the beginning, middle, and end.

What they found was that the models processed:

This is called Context Rot!

The more it goes into the document, the more it decays, hence the part "rot".

So, this connects right back to the original experiment, where the models were probably fighting against context rot, to find the two additional spells.

RAG

Now you might be wondering, why not use RAG(Retrieval Augmented Generation)? (if you don't know what that is, don't worry, it'll be covered later on)

Well, it can help, but if the question is broad, the retrieval step could either return too many chunks(right back to context rot), or too little.

Relevance

Why does this matter?

Well people that are feeding giant documents into these LLMs, and asking for something very specific, is who this affects.

This is because the LLM might tell you that nothing is wrong, even through it didn't even proccess the entire document.

Imagine a lawyer uploading a vendor agreement and asking Claude to check for suspicious clauses:

Research:

All this research is super cool for this idea, and is fun to dive into if you're interested!

  1. This is the Stanford Study that talked about Harry Potter! Link

  2. This is the study that checked how LLMs are reading! Link