Hacker Newsnew | past | comments | ask | show | jobs | submit | redman25's commentslogin

Not to be confused with nanocoder, the agentic coding harness.

https://github.com/Nano-Collective/nanocoder


Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.

Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.

If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.

Just like smaller size models are speed / cost optimization, so is MoE.

G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.

G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).


The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.

200a10b please, 200a3b is too little active to have good intelligence IMO and 10b is still reasonably fast.

It's why you always have a rollback plan. Every `up` needs to a `down`.


If you do that, it expands your test matrix quadratically.

So, it makes sense if you have infinite testing budgets.

Personally, I prefer exhaustively testing the upgrade path, and investing in reducing the time it takes to push out a hot fix. Chicken bits are also good.

I haven’t heard of any real world situations where supporting downgrades of persistent formats led to best of class product stability.

Would love to hear of an example.


Aircraft engineer: “That’s why you have parachutes.”

They might be an appropriate safeguard for a prototyping shop, but not for Delta.


So someone is debugging something with git bisect and stumbles on the old commit and gets pwned. Maybe that's why they force killed it? To avoid people going back in history and stumbling on it.


CPU/network throttling needs to be set for the product manager and management - that's the only way you might see real change.

We have some egregious slowness in our app that only shows up for our largest customers in production but none of our organizations in development have that much data. I created a load testing organization and keep considering adding management to it so they implicitly get the idea that fixing the slowness is important.


It’s mainly the benchmarks that have encouraged that. The more tokens they crank out the more likely the answer is to be somewhere in the output.


I feel like the right response for those situations is to start asking questions of the user. It’s what a human would do if they did not understand.


I made the argument multiple times that the right answer to many prompts would be a question, and it was allowed under some rare circumstances, but far too few.

I suspect in part because the provider also didn't want to create an easy cop out for the people working on the fine-tuning part (a lot of my work was auditing and reviewing output, and there was indeed a lot of really sloppy work, up to and including cut and pasting output from other LLMs - we know, because on more than one occasion I caught people who had managed to include part of Claudes website footer in their answer...)


As in, participants would copy output from one LLM as a question to another?


It depends on the harness and/or inference engine whether they keep the reasoning of past messages.

Not to get all philosophical but maybe justification is post-hoc even for humans.


How about a human coworker who screws up 1% of the time? Doesn’t sound so bad in that light. It’s the nature of being human.

Good code review is the solution but if it’s faster to do it yourself, that’s fine too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: