More

redman25 · 2026-04-05T19:36:17 1775417777

Not to be confused with nanocoder, the agentic coding harness.

https://github.com/Nano-Collective/nanocoder

redman25 · 2026-04-02T21:38:53 1775165933

Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.

swalsh · 2026-04-03T00:00:41 1775174441

Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.

petu · 2026-04-03T07:05:58 1775199958

If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.

Just like smaller size models are speed / cost optimization, so is MoE.

G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.

G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).

daemonologist · 2026-04-03T01:43:37 1775180617

The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.

redman25 · 2026-04-02T21:31:46 1775165506

200a10b please, 200a3b is too little active to have good intelligence IMO and 10b is still reasonably fast.

redman25 · 2026-03-17T14:13:20 1773756800

It's why you always have a rollback plan. Every `up` needs to a `down`.

hedora · 2026-03-17T16:34:32 1773765272

If you do that, it expands your test matrix quadratically.

So, it makes sense if you have infinite testing budgets.

Personally, I prefer exhaustively testing the upgrade path, and investing in reducing the time it takes to push out a hot fix. Chicken bits are also good.

I haven’t heard of any real world situations where supporting downgrades of persistent formats led to best of class product stability.

Would love to hear of an example.

DrewADesign · 2026-03-18T13:50:02 1773841802

Aircraft engineer: “That’s why you have parachutes.”

They might be an appropriate safeguard for a prototyping shop, but not for Delta.

redman25 · 2026-03-16T13:48:22 1773668902

So someone is debugging something with git bisect and stumbles on the old commit and gets pwned. Maybe that's why they force killed it? To avoid people going back in history and stumbling on it.

redman25 · 2026-03-16T13:21:54 1773667314

CPU/network throttling needs to be set for the product manager and management - that's the only way you might see real change.

We have some egregious slowness in our app that only shows up for our largest customers in production but none of our organizations in development have that much data. I created a load testing organization and keep considering adding management to it so they implicitly get the idea that fixing the slowness is important.

redman25 · 2026-03-13T02:43:51 1773369831

It’s mainly the benchmarks that have encouraged that. The more tokens they crank out the more likely the answer is to be somewhere in the output.

redman25 · 2026-03-13T02:41:54 1773369714

I feel like the right response for those situations is to start asking questions of the user. It’s what a human would do if they did not understand.

vidarh · 2026-03-13T07:38:51 1773387531

I made the argument multiple times that the right answer to many prompts would be a question, and it was allowed under some rare circumstances, but far too few.

I suspect in part because the provider also didn't want to create an easy cop out for the people working on the fine-tuning part (a lot of my work was auditing and reviewing output, and there was indeed a lot of really sloppy work, up to and including cut and pasting output from other LLMs - we know, because on more than one occasion I caught people who had managed to include part of Claudes website footer in their answer...)

winterqt · 2026-03-14T04:23:50 1773462230

As in, participants would copy output from one LLM as a question to another?

redman25 · 2026-03-13T02:39:30 1773369570

It depends on the harness and/or inference engine whether they keep the reasoning of past messages.

Not to get all philosophical but maybe justification is post-hoc even for humans.

redman25 · 2026-03-13T02:34:55 1773369295

How about a human coworker who screws up 1% of the time? Doesn’t sound so bad in that light. It’s the nature of being human.

Good code review is the solution but if it’s faster to do it yourself, that’s fine too.