HN Reader

New Top Best Ask Show Job

R-Zero: Self-Evolving Reasoning LLM from Zero Data

121

61

4 months agoby lawrenceyan

© 2024 wagao

"Starting from a single base LLM"

Ok, zero data, except the data used in the teacher model.

4 months agoby Iv

For values of zero quite far above zero.

4 months agoby thom

Perpetual Motion Machines were a thing at some point, too.

4 months agoby nakamoto_damacy

Terrible choice of name. DeepSeek developed a historically important model called “R-Zero” (this was the predecessor to R1 that was training without any coldstart SFT, and was very strong but difficult to read chain of thought because it code switches into Chinese and has no line breaks).

4 months agoby clbrmbr

I think in formal domain like lean it should actually be possible to do it from zero--but seems like no major successes no far

4 months agoby Davidzheng

OK but how do you ensure it's improving in a direction that aligns with reality?

4 months agoby lawlessone

I still don't understand what a "reasoning" LLM is

4 months agoby freejazz

4 months agoby neuroelectron

What could go wrong?

4 months agoby cyberge99

Conceptually, it's effectively a GAN

4 months agoby jasonjmcghee