More

kgeist · 2026-04-08T20:57:51 1775681871

Agree, I recently updated our office's little AI server to use Qwen 3.5 instead of Qwen 3 and the capability has considerably increased, even though the new model has fewer parameters (32b => 27b)

Yesterday I spent some time investigating it:

- Gated DeltaNet (invented in 2024 I think) in Qwen3.5 saves memory for the KV kache so we can afford larger quants

- larger quants => more accurate

- I updated the inference engine to have TurboQuant's KV rotations (2026) => 8-bit KV cache is more accurate

- smaller KV cache requirements => larger contexts

Before, Qwen3 on this humble infra could not properly function in OpenCode at all (wrong tool calls, generally dumb, small context), now Qwen 3.5 can solve 90% problems I throw at it.

All that thanks to algorithmic/architectural innovations while actually decreasing the parameter count.

kgeist · 2026-04-08T06:25:39 1775629539

>I read up to here, but I wasn't convinced that this is the revelation that the author claims

The rest of the arguments is as weak:

1) both released open-source software

2) both don't like spam

3) both like using pseudonyms online

4) both love freedom

5) both are anti-copyright

etc.

Basically, the author found that Adam Back used the same words on X as Satoshi did in some emails (including such rare words as "dang," "backup," and "abandonware") and then decided to find every possible "link" they could to build the case, even if most of the links are along the lines of "Both are humans! Coincidence? I think not."

DeliciousSeaCow · 2026-04-08T17:50:22 1775670622

It's weird they spent so much time on the written word similarities, when the biggest reveal here is that Back disappears off the email lists (on a topic he is VERY interested in and has historically corresponded on) when Nakamoto appears, and then comes back when Nakamoto disappears.

tovej · 2026-04-08T07:29:14 1775633354

I think this misses the point. The point is that interests and writing style matches, which means there's a higher chance they are the same person.

The more similarities you find, the closer the match. It's in no way proof, of course. But it does provide good reason to look closer

rcxdude · 2026-04-08T08:58:00 1775638680

Only if those similarities are indicating more than 'generic internet hacker' for both of them. You only need 23 bits to identify a person but those are 23 uncorrelated bits, and all the 'similarities' presented here are extremely strongly correlated with themselves.

extraduder_ire · 2026-04-08T13:56:41 1775656601

Where are you getting 23 from? That's only 8-ish million values max.

bnjemian · 2026-04-08T18:21:46 1775672506

Suspect it's a typo. 33, not 23, gives ~8.6*10^9.

alwa · 2026-04-08T16:56:35 1775667395

The interests and writing style differentiate Mr. (Dr.?) Back from the general public, sure. But from what I’m reading, they don’t do a great job of distinguishing between 90s hackers.

“Get this, his PhD thesis dealt with a computer language called C++, just like Bitcoin papers used” seems both confused and impossibly lazy to me.

> “Scrap patents and copyright,” Mr. Back wrote in September 1997.

> Satoshi did a similar thing. He released the Bitcoin software under M.I.T.’s open-source license

Really?

Like saying “get this, his college-aged musical interests included the Urban American musical style known as ‘Hip Hop’; therefore Tupac didn’t really die and this is him.” Heavy on insinuation, light on seriousness. Strong “…you’re not from around here, are you?” vibes.

What does this kind of journalism hope to accomplish, anyway? Beyond bothering middle-aged nerds for gossip? And providing a frame for the author’s cute little sleuth jape?

“Good reason to look closer” assumes there’s good reason to pick through ancient rubble in the first place.

defrost · 2026-04-08T09:29:52 1775640592

Similarities in style and word were common enough in small circles such as the cyphyrpunks that spawned those discussions.

Then there's not altogether unlikely chance that Satoshi is a nodding homage to Nicolas Bourbaki, each contributor holding part of a multiparty voting key.

kgeist · 2026-04-06T22:16:42 1775513802

>just report the post, i would have taken it down

Last time someone asked to take down a post, you said "bitch come suck my dick" according to your own blog.

kgeist · 2026-04-06T14:36:57 1775486217

Found his record in Russia's official company registry. This is what he officially does as an entepreneur:

  56.10 — Restaurant activities and food delivery services

  47.23 — Retail sale of fish, crustaceans, and mollusks in specialized stores

  47.25.12 — Retail sale of beer in specialized stores

  47.25.2 — Retail sale of soft drinks in specialized stores

  47.29.39 — Retail sale of other food products in specialized stores, not included in other groups

  68.20 — Lease and management of own or leased real estate

Money is reinvested into selling beer and fish :) Interestingly, he registered all that in 2019, just when the ransoms started.

ivan_gammel · 2026-04-06T14:42:35 1775486555

Classic money laundering.

tokai · 2026-04-06T15:47:11 1775490431

> 56.10 — Restaurant activities and food delivery services

That one is a classic for russian criminals and warlords.

randomNumber7 · 2026-04-06T19:17:18 1775503038

This has nothing to do with russia. All over the world it's an obvious choice to use it for money laundering.

tokai · 2026-04-06T19:45:56 1775504756

Yevgeny Prigozhin was well known for his restaurants and catering businesses.

randomNumber7 · 2026-04-06T20:21:53 1775506913

And you don't understand simple logic.

tokai · 2026-04-06T21:33:23 1775511203

And you should reread the HN guidelines.

ecshafer · 2026-04-06T16:37:10 1775493430

I find it entertaining that even as part of a Russian hacking gang, the real threat is the Russian tax authorities. Regardless of how you got the money, need to pay the taxes.

diath · 2026-04-06T17:45:43 1775497543

That's not specific to Russia, almost every country in the world requires you to report illegal income for tax purposes, including the US.

jojobas · 2026-04-07T01:10:12 1775524212

His first, middle and last names are among the more popular, chances are you found a namesake.

kgeist · 2026-04-07T08:45:19 1775551519

Schukin isn't a very common last name (definitely not Ivanov-tier). The first name, the patronymic (his father is Maksim) and the last name all match, as well as the city (the article says he lives in Krasnodar). In fact, this Krasnodar-based entrepreneur is the only person that shows up in the search at all for "Daniil Maksimovich Schukin". Not to say the business was registered right when the ransoms started (2019). Too many coincidences if it's just a namesake.

kgeist · 2026-04-05T21:51:46 1775425906

Qwen3.5 comes in various sizes (including 27B), and judging by the posts on HN, /LocalLlama etc., it seems to be better at logic/reasoning/coding/tool calling compared to Gemma 4, while Gemma 4 is better at creative writing and world knowledge (basically nothing changed from the Qwen3 vs. Gemma3 era)

Mil0dV · 2026-04-05T22:06:19 1775426779

Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

green7ea · 2026-04-06T06:27:13 1775456833

It should be easy with a Q4 (quantization to 4 bits per weight) and a smallish context.

You won't have much RAM left over though :-/.

At Q4, ~20 GiB

https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

rhdunn · 2026-04-06T10:06:02 1775469962

For llama-server (and possibly other similar applications) you can specify the number of GPU layers (e.g. `--n-gpu-layers`). By default this is set to run the entire model in VRAM, but you can set it to something like 64 or 32 to get it to use less VRAM. This trades speed as it will need to swap layers in and out of VRAM as it runs, but allows you to run a larger model, larger context, or additional models.

kgeist · 2026-04-04T10:27:23 1775298443

I think the headline is misleading. It's some random fork of llama.cpp, I can't find evidence that TurboQuant was actually added to llama.cpp proper.

The only legit PR I can find is this [0] and it's still open.

There's currently a lot of rejected vibe-coded PRs: [1] (violation of AI policy).

The OP's PR says it was generated with Claude Code so it has a very low chance of getting merged upstream.

[0] https://github.com/ggml-org/llama.cpp/pull/21089

[1] https://github.com/ggml-org/llama.cpp/pulls?q=Turboquant+is%...

lastdong · 2026-04-04T14:50:25 1775314225

Indeed, thanks for pointing this out and the links. With the excitement I misread that it was an MR from the fork to the main project. I don’t think I’m able to fix the title though.

I find it quite exciting to read some results in an effort to understand if TurboQuant main ideas can be applied to model weights. There are other similar projects, so we’ll see, but it seems some of this fork results look promising.

kgeist · 2026-04-02T15:35:16 1775144116

They've always had closed-source variants:

- Qwen3.5-Plus

- Qwen3-Max

- Qwen2.5-Max

etc. Nothing really changed so far.

kgeist · 2026-04-01T08:50:10 1775033410

>Second, what's even more crazy is that roughly 98% of that DNA is actually non-coding.. just junk.

I think it's a myth that non-coding DNA is junk. Say:

https://www.nature.com/articles/444130a

>'Non-coding' DNA may organize brain cell connections.

kgeist · 2026-04-01T05:34:48 1775021688

>One theory is that the knowledge required to solve the task is already stored in the parameters of the model, and only the style has to change for task success

>In particular, learning to generate longer outputs may be possible in few parameters

Reminded me of: https://arxiv.org/abs/2501.19393

>we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps

Maybe, indeed, the model simply learns to insert the EOS token (or similar) later, and the capability is already in the base model

kgeist · 2026-03-31T05:50:18 1774936218

Prior art: https://news.ycombinator.com/item?id=46590280

>TimeCapsuleLLM: LLM trained only on data from 1800-1875