because they’re able to pass signal on tons of newly generated tokens based on w...

		whimsicalism on Jan 25, 2025 \| parent \| context \| favorite \| on: DeepSeek-R1: Incentivizing Reasoning Capability in... because they’re able to pass signal on tons of newly generated tokens based on whether they result in a correct answer, rather than just fitting on existing tokens. it’s on the path to self play