More

yorwba · 2026-03-19T11:32:38 1773919958

After reading both the original post and this submission, what do you think is new here?

regularfry · 2026-03-19T13:18:06 1773926286

> The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.

As far as I can see that's not implied by the original post.

But that's beside the point: quoting the bit where the poster says "here's what I'm building on top of" and using that to imply they haven't done anything new is a bit pointless, no?

simgt · 2026-03-19T13:43:32 1773927812

You're right that my quote was misleading, I overlooked "the weird part" in the post because it didn't seem new to me either.

Here's the section in the original post that covers it: https://dnhkng.github.io/posts/rys/#the-brain-scanner All heatmaps are split by tasks and show an optimal point for each. The resulting routing he chose is a trade-off for both tasks, there isn't much else to do unless you intend to train a router anyway.

> So the ‘math organ’ has boundaries on both sides. Too few layers and you get nothing — you’ve cut into the circuit and it can’t complete its operation. Too many layers and you also get nothing — you’ve included tissue from a neighbouring circuit that doesn’t belong. Pre-training carved these structures out of the layer stack, and they only work whole. It also doesn’t translate to other tasks, as the heatmap for EQ scores doesn’t have this patch.

gavinray · 2026-03-19T17:19:27 1773940767

This is stated in the original post as well, under "The Beginning of LLM Neuroanatomy?" section:

  > From end-position 43 to 46, we then see solid boosts in math scores (red = good, yay). But include layer 46 or beyond, and the benefits collapse again. The hypothesis: position 47 is where a different circuit begins. Including even one step of the next recipe messes up the current recipe.

  > So the ‘math organ’ has boundaries on both sides. Too few layers and you get nothing — you’ve cut into the circuit and it can’t complete its operation. Too many layers and you also get nothing — you’ve included tissue from a neighbouring circuit that doesn’t belong. Pre-training carved these structures out of the layer stack, and they only work whole. It also doesn’t translate to other tasks, as the heatmap for EQ scores doesn’t have this patch.

  > This is a much more specific claim than “middle layers do reasoning.” It’s saying the reasoning cortex is organised into functional circuits: coherent multi-layer units that perform complete cognitive operations. Each circuit is an indivisible processing unit, and the sweeps seen in the heatmap is essentially discovering the boundaries of these circuits.

jstanley · 2026-03-19T13:05:19 1773925519

It's all new to me.

yorwba · 2026-03-19T08:44:51 1773909891

If you have showdead on, you can see that this account posts generic oneliners: https://news.ycombinator.com/threads?id=balinha_8864

big-chungus4 · 2026-03-19T11:25:35 1773919535

Is that bad?

yorwba · 2026-03-19T13:59:36 1773928776

It's an indication that it's one of the many bot accounts currently doing the same thing https://hn.algolia.com/?query=this%20is%20more%20nuanced%20t...

So the reason the comment appears weirdly disconnected from the content of the article is that it was generated independently from the content of the article.

yorwba · 2026-03-18T13:41:08 1773841268

Previous discussion: https://news.ycombinator.com/item?id=47385935 (439 points 3 days ago, 388 comments)

dang · 2026-03-18T21:31:16 1773869476

Comments moved thither. Thanks!

yorwba · 2026-03-17T13:44:50 1773755090

It's just setting the font-family in the style attribute of a <span>. (As you can see by inspecting the text/html content of your clipboard, e.g. with `xclip -selection clipboard -o -t text/html`)

yorwba · 2026-03-17T12:56:29 1773752189

Well, the weights are accumulated in full precision and are multiplied by a full-precision scale factor after quantization, and the activations and backward pass are computed in full precision as well, so it's not quite true 4-bit precision training. The resulting model can be stored with just slightly more than 4 bits per parameter, though.

jcalvinowens · 2026-03-17T13:33:35 1773754415

I really just don't understand how the quantization error doesn't ruin the results. Is there some reading you'd recommend?

I can easily understand how the block formats win.

yorwba · 2026-03-17T09:04:00 1773738240

Iran does have an oil pipeline bypassing the Strait of Hormuz https://www.worldoil.com/news/2021/5/31/iran-opens-new-oil-e... but it doesn't have infinite capacity, so that doesn't mean there are zero problems.

yorwba · 2026-03-17T08:40:39 1773736839

Nice work! The paper feels verbose at times and could use some editing to slim it down (also, equation 6 is just equation 5 in a box) but I enjoyed it a lot nonetheless.

yorwba · 2026-03-16T16:37:42 1773679062

There are a few pictures of truncated icosahedra in the article, alongside several other shapes that are not icosahedra. The point is that they have icosahedral symmetry. The L is important.

JackFr · 2026-03-16T17:26:05 1773681965

I was going to comment pedantically that soccer balls were dodecahedrons not icosahedrons, but in reading the article, I came to realize that truncated icosahedrons are the same as truncated dodecahedrons.

This was such a delightful realization I felt the need to comment anyway.

zem · 2026-03-16T23:07:58 1773702478

that is indeed a delightful realisation! akin to when I noticed that a cube and an octahedron both had a cross section that was a regular hexagon.

meindnoch · 2026-03-17T00:20:58 1773706858

Hmm. I'm sorry, but truncated dodecahedra are different from truncated icosahedra.

Truncated dodecahedra are made from twelve 10-gon and twenty triangular faces. Truncated icosahedra are made from twenty hexagonal and twelve pentagonal faces.

yorwba · 2026-03-16T16:31:30 1773678690

Flowering plants (angiosperms) appeared during the Cretaceous before dinosaurs got wiped out, and there is fossil evidence of insects pollinating non-flowering plants (gymnosperms) like ferns and confers even earlier than that: https://repository.si.edu/server/api/core/bitstreams/152b12d...

yorwba · 2026-03-15T14:12:35 1773583955

It makes it easier to compare with other papers. If two different papers apply different methods to different models and get different results, how do you know which method is better?

Once you have identified the best method and want to productize it, it would of course make sense to apply it on top of the best model, but if you're just doing research, you can skip that expensive last step.