HN Reader

New Top Best Ask Show Job

More Layers Unlock 2^N Transformer Context Depth with Divide and Conquer

5

3

6 months agoby michael_lutz

© 2024 wagao

Context windows are now 1M+ tokens, but context depth is limited. Often, the answer is hidden behind layers of linked information, but an attention block can only resolve one link at a time. We trained a tiny 5 layer model that beats GPT-4.5 on a variable evaluation task requiring deep, recursive reasoning. How? It learned a divide and conquer mechanism.

6 months agoby michael_lutz