I've been comparing R1 to O1 and O1-pro, mostly in coding, refactoring and understanding of open source code.
I can say that R1 is on par with O1. But not as deep and capable as O1-pro.
R1 is also a lot more useful than Sonnete. I actually haven't used Sonnete in awhile.
R1 is also comparable to the Gemini Flash Thinking 2.0 model, but in coding I feel like R1 gives me code that works without too much tweaking.
I often give entire open-source project's codebase (or big part of code) to all of them and ask the same question - like add a plugin, or fix xyz, etc.
O1-pro is still a clear and expensive winner. But if I were to choose the second best, I would say R1.
At this point, it's a function of how many thinking tokens can a model generate. (when it comes to o1 and r1). o3 is likely going to be superior because they used the training data generated from o1 (amongst other things). o1-pro has a longer "thinking" token length, so it comes out as better. Same goes with o1 and API where you can control the thinking length. I have not seen the implementation for r1 api as such, but if they provide that option, the output could be even better.
I can say that R1 is on par with O1. But not as deep and capable as O1-pro. R1 is also a lot more useful than Sonnete. I actually haven't used Sonnete in awhile.
R1 is also comparable to the Gemini Flash Thinking 2.0 model, but in coding I feel like R1 gives me code that works without too much tweaking.
I often give entire open-source project's codebase (or big part of code) to all of them and ask the same question - like add a plugin, or fix xyz, etc. O1-pro is still a clear and expensive winner. But if I were to choose the second best, I would say R1.