It does if the spend drives GPU prices so high that more researchers can't affor...

anothermathbozo · on Jan 25, 2025

The DS team themselves suggest large amounts of compute are still required

fspeech · on Jan 25, 2025

https://www.macrotrends.net/stocks/charts/NVDA/nvidia/gross-...

GPU prices could be a lot lower and still give the manufacturer a more "normal" 50% gross margin and the average researcher could afford more compute. A 90% gross margin, for example, would imply that price is 5x the level that that would give a 50% margin.

pama · on Jan 26, 2025

However, look at the figure for R1-zero. The x-axis is effectively the number of RL steps, measured in the thousands. Each of them involves a whole group of inferences, but compare that to the gradient updates required for consuming 15 trillion tokens during pretraining, and it is still a bargain. Direct RL on the smaller models was not effective as quickly as with DeepSeek v3, so although in principle it might work at some level of compute, it was much cheaper to do SFT of these small models using reasoning traces of the big model. The distillation SFT on 800k example traces probably took much less than 0.1% of the pretraining compute of these smaller models, so this is the compute budget they compare RL against in the snippet that you quote.