Spending more time than I should in a sunday playing with r1/o1/sonnet code gene...

corysama · on Jan 26, 2025

> Maybe if the thinking blocks from previous answers where not used for computing new answers it would help

Deepseek specifically recommends users ensure their setups do not feed the thinking portion back into the context because it can confuse the AI.

They also recommend against prompt engineering. Just make your request as simple and specific as possible.

I need to go try Claude now because everyone is raving about it. I’ve been throwing hard, esoteric coding questions at R1 and I’ve been very impressed. The distillations though do not hold a candle to the real R1 given the same prompts.

attentive · on Jan 27, 2025

Does R1 code actually compiles and work as expected? - Even small local models are great at answering confidently and plausibly. Luckily coding responses are easily verifiable unlike more fuzzy topics.

bwfan123 · on Jan 26, 2025

The panic is because a lot of beliefs have been challenged by r1 and those who made investments on these beliefs will now face losses

malpani12 · on Jan 26, 2025

Based on my personal testing for coding, I still found Claude Sonnet is the best for coding and its easy to understand the code written by Claude (I like their code structure or may at this time, I am used to Claude style).

freehorse · on Jan 27, 2025

I also feel the same. I like the way sonnet answers and writes code, and I think I liked qwen 2.5 coder because it reminded me of sonnet (I highly suspect it was trained on sonnet's output). Moreover, having worked with sonnet for several months, i have system prompts for specific languages/uses that help produce the output I want and work well with it, eg i can get it produce functions together with unit tests and examples written in a way very similar to what I would have written, which helps a lot understand and debug the code more easily (because doing manual changes I find inevitable in general). It is not easy to get to use o1/r1 then when their guidelines is to avoid doing exactly this kind of thing (system prompts, examples etc). And this is something that matches my limited experience with them, plus going back and forth to fix details is painful (in this i actually like zed's approach where you are able to edit their outputs directly).

Maybe a way to use them would be to pair them with a second model like aider does, i could see r1 producing something and then a second model work starting from their output, or maybe with more control over when it thinks and when not.

I believe these models must be pretty useful for some kinds of stuff different from how i use sonnet right now.

attentive · on Jan 27, 2025

Sonnet isn't just better, it actually succeeds where R1 utterly fails after many minutes of "thinking" and back and forth prompting on a simple task writing go cli to do icmp ping without requiring root of suid or calling external ping cmd.

Faster too.