More

NitpickLawyer · 2026-04-21T07:44:44 1776757484

Why, though? Just because some people would find it odd? Who cares?

Trying to limit / disallow something seems to be hurting the overall accuracy of models. And it makes sense if you think about it. Most of our long-horizon content is in the form of novels and above. If you're trying to clamp the machine to machine speak you'll lose all those learnings. Hero starts with a problem, hero works the problem, hero reaches an impasse, hero makes a choice, hero gets the princess. That can be (and probably is) useful.

matthiasrsl · 2026-04-21T12:26:45 1776774405

Is it? I don't think most of the content LLM are trained on is written in the first person. Wikipedia / news articles / other information articles don't aren't written in the first person. Most novels, or at least a substantial portion of it are not written in the first person.

LLM write in the first person because they have been specifically been finetuned for a chat task, it's not a fundamental feature of language models that would have to be specifically disallowed

NitpickLawyer · 2026-04-21T05:50:55 1776750655

And, ironically, it's hosted on vercel :D

NitpickLawyer · 2026-04-20T18:21:00 1776709260

Yes, you can globally ban providers in your openrouter settings.

NitpickLawyer · 2026-04-20T16:03:24 1776701004

While I'm skeptical of any "beats opus" claims (many were said, none turned out to be true), I still think it's insane that we can now run close-to-SotA models locally on ~100k worth of hardware, for a small team, and be 100% sure that the data stays local. Should be a no-brainer for teams that work in areas where privacy matters.

cedws · 2026-04-20T16:11:22 1776701482

Even the smaller quantized models which can run on consumer hardware pack in an almost unfathomable amount of knowledge. I don't think I expected to be able to run a 'local Google' in my lifetime before the LLM boom.

sterlind · 2026-04-20T17:49:37 1776707377

I'm extremely curious how these models learn to pack a lossily-compressed representation of the entire Internet (more or less) into a few hundred billion parameters. like, what's the ontology?

osti · 2026-04-20T16:33:42 1776702822

I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.

NitpickLawyer · 2026-04-20T16:38:30 1776703110

Yeah, but that's personal use at best, not much agentic anything happening on that hardware. Macs are great for small models at small-medium context lengths, but at > 64k (something very common with agentic usage) it struggles and slows down a lot.

The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.

osti · 2026-04-20T18:52:10 1776711130

True, but I think for local models, we are mostly considering personal usage.

zozbot234 · 2026-04-20T17:00:05 1776704405

You could run it with SSD offload, earlier experiments with Kimi 2.5 on M5 hardware had it running at 2 tok/s. K2.6 has a similar amount of total and active parameters.

osti · 2026-04-20T18:50:54 1776711054

Yeah... I would definitely call 2t/s unusable. For simple chats, I'd want at least 15 t/s. For agentic coding (which this model is advertised for), I'd want good prefill performance as well.

veber-alex · 2026-04-20T23:48:00 1776728880

That's just throwing money away. The performance with large context would have been unusable especially if you need to serve more then a single person.

NitpickLawyer · 2026-04-20T12:14:32 1776687272

> couldn’t be different than how Claude Code was received by software devs. It’s simply useless for designers, their workflow is very different from software devs. You can’t “oh let Claude Design come up with a quick logo for this” in the same way that Claude Code was able to quickly solve small annoyances for devs.

Haha, that's exactly how cc was received initially. It's just autocomplete. It's useless. It can't even x. I tried to y and it gave me z. Over and over all over the internet this was the reaction. Then the bargaining began. Oh, it will maybe speed up some simple things. Like autocomplete on steroids. Maaaybe do some junior tasks once in a while. And so on...

tomhallett · 2026-04-20T13:20:09 1776691209

Agreed - For the last 20 years or so, designers at basecamp.com do all of their frontend design directly in rails/html/css and then have the developers "re-implement it". The upside of this approach is designs which really work in the browser and they found it to be faster. The downside of this approach is that it's harder to find designers who have both of those skills, but that was an acceptable tradeoff for them because they are a smaller run company.

To me, it seems obvious that AI will attack this from both directions - upskilling developers to make more design changes AND upskilling designers to make more design iterations and more changes to the codebase -- the design artifact is "new react components" (which can be re-implemented or not) instead of a figma design.

dilawar · 2026-04-20T12:45:04 1776689104

Fair point but unlike code, design (webpage), audio, video are seen by consumers. If Sora (AI video) didn't fly, how'd AI web-design fly?

It is pretty good for internal apps and dashboards or small hobby pages and websites where being generic look and feel doesn't matter much.

coldtea · 2026-04-20T20:57:27 1776718647

>how'd AI web-design fly?

Most web design is already crap to begin with, so AI web-design will fit right in.

Plus compared to the totally open-ended video generation, web desisn is mostly samey (follows a few trends and conventions), way more restricted, and doesn't include difficult-to-recreate (due to uncanny valley effect) humans in it.

codazoda · 2026-04-20T13:16:59 1776691019

Seems like The Innovators Dilemma playing out.

deaux · 2026-04-20T12:43:28 1776689008

> Haha, that's exactly how cc was received initially.

Haha, maybe by you. By many on HN, but HN is a bubble of its own. By plenty of others it was received very differently. Many of us had been doing agentic coding for more than a year already when Claude Code was released, because we found it valuable.

We will see if such groups of professional designers also form for Claude Design or other such tools.

troupo · 2026-04-20T12:44:32 1776689072

It's still an autocomplete on steroids (that's what LLMs are).

It still produces subpar code, with horrendous data access patterns, endless duplication of fucntionality etc. You still need a human in the loop to fix all the mistakes (unless you're Garry Tan or Steve Yegge who assume that quality is when you push hundreds of thousands of LoC per day).

Same here.

Oh, and Claude Code is significantly worse at generating design code than almost any other type of code.

coldtea · 2026-04-20T20:58:30 1776718710

>It still produces subpar code, with horrendous data access patterns, endless duplication of fucntionality etc

Not really. 2024 called.

troupo · 2026-04-20T21:48:23 1776721703

Just because you don't look at the code, doesn't mean it doesn't produce subpar code constantly.

Opus 4.7, high effort. Literally 30 minutes ago. There's a `const UNMATCHED_MARKER = "<hardcoded value>"` that we want to remove from a file. Behold (the first version was a full on AST walk for absolutely no reason whatsoever, by the way):

      function restoreUnmatchedSelectors(cssPath) {
          const cssContent = readFileSync(cssPath, "utf-8");
          const marker = UNMATCHED_MARKER + " ";
          const count = (cssContent.match(new RegExp(escapeRegExp(marker), "g")) || []).length;          
          if (count > 0) {
              writeFileSync(cssPath, cssContent.replaceAll(marker, ""));
              return count;
          }
      }

      function escapeRegExp(s) {
          return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
      }

Don't get me started on all the code duplications, utility functions written from scratch in every module that needs them, reading full database just to count number of records...

coldtea · 2026-04-20T23:08:36 1776726516

You're doing something very wrong. Consistently get a simple sed or similar oneliner for the above, not just on Claude Opus 4.7, across several LLMs.

Maybe if you had a little more faith, you'd get better results :)

troupo · 2026-04-21T05:39:50 1776749990

> Consistently get a simple sed or similar oneliner

Ah yes. A sed one-liner in a Javascript code

> Maybe if you had a little more faith

Ah yes, all I have to do as an engineer, is to ... checks notes ... have faith

NitpickLawyer · 2026-04-20T04:23:59 1776659039

Unless I'm parsing your reply very badly, I see no world in which anything dealing with HTTP would be more expensive than dealing with kv cache (loading from "cold" storage, deciding which compute unit to load it into, doing the actual computations for the next call, etc).

stingraycharles · 2026-04-20T10:45:05 1776681905

No, that’s not the issue. What people fail to understand is that every request - eg every message you send, but also tool call responses - require the entire conversation history to be sent, and the LLM providers need to reprocess things.

The attention part of LLMs (that is, for every token, how much their attention is to all other tokens) is cached in a KV cache.

You can imagine that with large context windows, the overhead becomes enormous (attention has exponential complexity).

NitpickLawyer · 2026-04-19T12:48:12 1776602892

> although he has collaborated on some stories this century

I bought "Bowl of Heaven" because his name was on it, but it was a disappointing read and DNF for me...

NitpickLawyer · 2026-04-19T04:18:27 1776572307

I remember seeing a yt video about this tech being already trialed (w/ regular lasers) for geothermal. They use lasers to "vaporise" rock, in the hopes of digging much more efficiently.

NitpickLawyer · 2026-04-18T05:41:23 1776490883

Haha! Did the same with Asimov. These "detectors" are a joke.

NitpickLawyer · 2026-04-18T05:39:46 1776490786

I c/p a section of Asimov's "The Last Question", since it was readily on the front page. It detected 14 patterns (2 reds, one yellow, bunch of green and blue) in 583 words. Welp, I guess it's back to school for mr. Asimov...

Update: 13 patterns in 800 words for Samuel Clemens. Apparently he's an em-dash abuser, but also likes "filler adverbs", "triple constructions" and "anaphora abuse". Damn!

And for Mr. Hemingway we have 43 patterns in 1600 words. 16 filler adverbs, 5 triple constructions, 5 staccato bursts, and 14 question then answer. My my...