For the past few months I’ve been working on Context for AI models. I genuinely believe this is one of the biggest opportunities in the industry right now.

Because context is messy.

Most models choke at ~100k tokens. And “just do RAG” often feels underwhelming, not because RAG is bad, but because RAG alone is not memory. It’s a search tool.

So… how should we approach context?

I think the best starting point is the human brain.

Now, what tools do we actually have in engineering to approximate that?

We have three primitives:

1) Compaction (summaries / rolling state)

This is why tools like Cursor / Claude Code can keep going, they compress the past into something smaller.

Downside: extreme compaction drifts. It loses details. It loses grounding.

2) Compression (real compression, not summarization)

This is the exciting part. Recently, beacon-style compression showed you can compact tokens (think ~8x) and keep inference fast. This opens the door to mid-term context at a scale that feels like “working memory”, not “tiny window”.

3) RAG (retrieval + citations)

RAG is great for long-term. It’s not perfect, it’s not always precise, but it’s the best tool we have to search large archives and bring back evidence.