Not necessarily if you are pushing against a data wall. One could ask: after adjusting for DS efficiency gains how much more compute has OpenAI spent? Is their model correspondingly better? Or even DS could easily afford more than $6 million in compute but why didn't they just push the scaling?
because they’re able to pass signal on tons of newly generated tokens based on whether they result in a correct answer, rather than just fitting on existing tokens.