Same model. Same retrieval set. Same query. Two completely different answers — because one of them remembered when its sources were written.
A developer in 2026 asks an AI agent for current best practices. Five documents are retrieved by semantic similarity:
Each document scored on pure semantic similarity to the query. Top result is a 2022 blog post — confidently written, dense keyword match, authoritative source.
FreshContext applies a single correction: an exponential decay weight based on document age. Nothing else changes — same embeddings, same vectors, same retrieval set.
For this demo, λ = 0.0001 (half-life ≈ 9.5 months). The live FreshContext engine uses source-specific λ values: HN front page ≈ 14h half-life, blog posts ≈ 29 days, academic papers ≈ 1.6 years.
The 2022 blog falls from rank 1 to rank 5. The 2026 X post rises from rank 4 to rank 1. Same documents, decay applied:
Both answers come from the same model with the same query. The only difference is which three documents the model saw at the top of the context window:
This isn't a model problem. The same model produced both answers. Claude wasn't wrong in the first version — it faithfully summarized what was put in front of it.
This isn't an embedding problem. The cosine similarity scores were correct. The 2022 blog really is semantically dense and well-written.
This is a context-engineering problem. Retrieval ranks correctly along one axis (semantic similarity) and ignores the other axis that matters in production (temporal validity). FreshContext adds the missing axis.
"Most RAG pipelines rank context correctly semantically but incorrectly temporally."
The retrieval set, the math, and the prompts are all open. Read the data file, change the query, swap λ, run it against your own RAG pipeline: