Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It does if the spend drives GPU prices so high that more researchers can't afford to use them. And DS demonstrated what a small team of researchers can do with a moderate amount of GPUs.


The DS team themselves suggest large amounts of compute are still required


https://www.macrotrends.net/stocks/charts/NVDA/nvidia/gross-...

GPU prices could be a lot lower and still give the manufacturer a more "normal" 50% gross margin and the average researcher could afford more compute. A 90% gross margin, for example, would imply that price is 5x the level that that would give a 50% margin.


However, look at the figure for R1-zero. The x-axis is effectively the number of RL steps, measured in the thousands. Each of them involves a whole group of inferences, but compare that to the gradient updates required for consuming 15 trillion tokens during pretraining, and it is still a bargain. Direct RL on the smaller models was not effective as quickly as with DeepSeek v3, so although in principle it might work at some level of compute, it was much cheaper to do SFT of these small models using reasoning traces of the big model. The distillation SFT on 800k example traces probably took much less than 0.1% of the pretraining compute of these smaller models, so this is the compute budget they compare RL against in the snippet that you quote.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: