Launch of MEMORY

For the past few months I’ve been working on Context for AI models. I genuinely believe this is one of the biggest opportunities in the industry right now.

Because context is messy.

Most models choke at ~100k tokens. And “just do RAG” often feels underwhelming, not because RAG is bad, but because RAG alone is not memory. It’s a search tool.

So… how should we approach context?

I think the best starting point is the human brain.

Short-term memory is where we store the “living” memory. It’s fresh, changing all the time, and it’s what you’re actively holding right now.
Mid-term memory is the working context of recent events, what you can reason over without “looking it up”.
Long-term memory is where experience lives: history, principles, beliefs, the stuff that grounds you. It’s not always “in your head”… you retrieve it when needed.

Now, what tools do we actually have in engineering to approximate that?

We have three primitives:

1) Compaction (summaries / rolling state)

This is why tools like Cursor / Claude Code can keep going, they compress the past into something smaller.

Downside: extreme compaction drifts. It loses details. It loses grounding.

2) Compression (real compression, not summarization)

This is the exciting part. Recently, beacon-style compression showed you can compact tokens (think ~8x) and keep inference fast. This opens the door to mid-term context at a scale that feels like “working memory”, not “tiny window”.

3) RAG (retrieval + citations)

RAG is great for long-term. It’s not perfect, it’s not always precise, but it’s the best tool we have to search large archives and bring back evidence.