HN Reader
New
Top
Best
Ask
Show
Job
R-Zero: Self-Evolving Reasoning LLM from Zero Data
121
61
3 days ago
by lawrenceyan
"Starting from a single base LLM"
Ok, zero data, except the data used in the teacher model.
3 days ago
by Iv
Terrible choice of name. DeepSeek developed a historically important model called “R-Zero” (this was the predecessor to R1 that was training without any coldstart SFT, and was very strong but difficult to read chain of thought because it code switches into Chinese and has no line breaks).
3 days ago
by clbrmbr
For values of zero quite far above zero.
3 days ago
by thom
I think in formal domain like lean it should actually be possible to do it from zero--but seems like no major successes no far
3 days ago
by Davidzheng
Conceptually, it's effectively a GAN
3 days ago
by jasonjmcghee
What could go wrong?
3 days ago
by cyberge99
OK but how do you ensure it's improving in a direction that aligns with reality?
3 days ago
by lawlessone
Now gamify it.
3 days ago
by neuroelectron
Perpetual Motion Machines were a thing at some point, too.
3 days ago
by nakamoto_damacy
I still don't understand what a "reasoning" LLM is
3 days ago
by freejazz