Claude is only better in some cherry picked standard eval benchmarks, which are ...

scarmig · on Oct 29, 2024

I'm subscribed to all of Claude, Gemini, and ChatGPT. Benchmarks aside, my go-to is always Claude. Subjectively speaking, it consistently gives better results than anything else out there. The only reason I keep the other subscriptions is to check in on them occasionally to see if they've improved.

amanzi · on Oct 29, 2024

I don't pay any attention to leaderboards. I pay for both Claude and ChatGPT and use them both daily for anything from Python coding to the most random questions I can think of. In my experience Claude is better (much better) that ChatGPT in almost all use cases. Where ChatGPT shines is the voice assistant - it still feels almost magical having a "human-like" conversation with the AI agent.

Cu3PO42 · on Oct 29, 2024

Anecdotally, I disagree. Since the release of the "new" 3.5 Sonnet, it has given me consistently better results than Copilot based on GPT-4o.

I've been using LLMs as my rubber duck when I get stuck debugging something and have exhausted my standard avenues. GPT-4o tends to give me very general advice that I have almost always already tried or considered, while Claude is happy to say "this snippet looks potentially incorrect; please verify XYZ" and it has gotten me back on track in maybe 4/5 cases.

rogerkirkness · on Oct 29, 2024

Claude 3.5 Sonnet (New) is meaningfully better than ChatGPT GPT4o or o1.

drcode · on Oct 29, 2024

my experience is that o1 is still slightly better for coding, sonnet new is better for analyzing data, and most other tasks besides coding

gr3ml1n · on Oct 29, 2024

3.5 Sonnet, ime, is dramatically better at coding than 4o. o1-preview may be better, but it's too slow.

trzy · on Oct 29, 2024

Bullshit. Claude 3.5 Sonnet owns the competition according to the most useful benchmark: operating a robot body in the real world. No other model comes close.

Matticus_Rex · on Oct 29, 2024

This seems incorrect. I don't need Claude 3.5 Sonnet to operate a robot body for me, and don't know anyone else who does. And general-purpose robotics is not going to be the most efficient way to have robots do many tasks ever, and certainly not in the short term.

trzy · on Oct 29, 2024

Of course not but the task requires excellent image understanding, large context window, a mix of structured and unstructured output, high level and spatial reasoning, and a conversational layer on top.

I find it’s predictive of relative performance in other tasks I use LLMs for. Claude is the best. The only shortcoming is its peculiar verbosity.

Definitely superior to anything OpenAI has and miles beyond the “open weights” alternatives like Llama.

int_19h · on Oct 29, 2024

The problem is that it also fails on fairly simple logic puzzles that ChatGPT can do just fine.

For example, even the new 3.5 Sonnet can't solve this reliably:

> Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?

In fact, not only its solution is wrong, but it can't figure out why it's wrong on its own if you ask it to self-check.

In contrast, GPT-4o always consistently gives the correct response.

BobaFloutist · on Oct 29, 2024

Yeah, but Mistral brews a mean cup of tea, and Llama's easily the best at playing hopscotch.