More

flux3125 · 2026-04-03T14:36:04 1775226964

to be fair, llama.cpp has gotten much easier to use lately with llama-server -hf <model name>. That said, the need to compile it yourself is still a pretty big barrier for most people.

ryandrake · 2026-04-03T16:55:38 1775235338

I started with ollama and now I'm using llama.cpp/llama-server's Router Mode that allows you to manage multiple models through a single server instance.

One thing I haven't figured out: Subjectively, it feels like ollama's model loading was nearly instant, while I feel like I'm always waiting for llama.cpp to load models, but that doesn't make sense because it's ultimately the same software. Maybe I should try ollama again to convince myself that I'm not crazy and that ollama's model loading wasn't actually instant.

dTal · 2026-04-03T20:21:37 1775247697

You don't need to compile it yourself though? Unless you want CUDA support on Linux I guess, dunno why you'd need such a silly thing though:

https://github.com/ggml-org/llama.cpp/releases

MarsIronPI · 2026-04-03T21:59:09 1775253549

> That said, the need to compile it yourself is still a pretty big barrier for most people.

My distro (NixOS) has binary packages though...

And there's packages in the AUR (Arch), GURU (Gentoo), and even Debian Unstable. Now, these might be a little behind, but if you care that much you can download binaries from GitHub directly.

flux3125 · 2026-04-02T18:39:04 1775155144

That’s not what it means. "-it" just indicates the model is instruction-tuned, i.e. trained to follow prompts and behave like an assistant. It doesn’t imply anything about whether thinking tokens like <think>....</think> were included or excluded during training. Thats a separate design choice and varies by model.

DeepYogurt · 2026-04-02T18:45:15 1775155515

What does that mean for a user of the model? Is the "-it" version more direct with solutions or something?

petu · 2026-04-02T21:15:00 1775164500

It means that model was tuned to to act as chat bot. So write a reply on behalf of assistant and stop generating (by inserting special "end of turn" token to signal inference engine to stop generation).

Base model (without instruction/chat tuning) just generates text non stop ("autocomplete on steroids") and text is not necessarily even formatted as chat -- most text in training data isn't dialogue, after all.

BoredomIsFun · 2026-04-03T04:42:21 1775191341

good old illustrtation: https://www.ml6.eu/en/blog/large-language-models-to-fine-tun...

The it- one is the yellow smiling dot, the pt- is the rightmost monster head.

nolist_policy · 2026-04-02T19:17:27 1775157447

Use the it versions. The other versions are base models without post-training. E.g. base models are trained to regurgitate raw wikipedia, books, etc. Then these base models are post-trained into instruction-tuned models where they learn to act as a chat assistant.

flux3125 · 2026-04-02T18:32:46 1775154766

You're not doing anything wrong, that's expected

flux3125 · 2026-03-27T18:28:53 1774636133

I bet the paper was vibe written too

flux3125 · 2026-03-27T08:45:19 1774601119

An error occurred. Try again.

But seriously, OP should somehow change this message to something like "Too many people are chatting right now, please try again in a moment."

(that would be even more appealing to recruiters)

flux3125 · 2026-03-24T15:10:00 1774365000

I completely removed nanobot after I found that. Luckily, I only used it a few times and inside a docker container. litellm 1.82.6 was the latest version I could find installed, not sure if it was affected.

flux3125 · 2026-03-22T09:39:24 1774172364

100 bucks? I'll pass, thanks.

flux3125 · 2026-03-14T16:21:24 1773505284

> probably less will be needed and the exact work will be transformed a bit

My guess is the opposite: they'll throw 5–10x more work at developers and expect 10x more output, while the marginal cost is basically just a Claude subscription per dev.

flux3125 · 2026-03-14T16:04:50 1773504290

> You can’t just tell an agent, Build me the code for a successful start-up. The agents work best when they’re being asked to perform one step at a time

That's also true for humans. If you sit down with an LLM and take the time to understand the problem you're trying to solve, it can perfectly guide you through it step by step. Even a non-technical person could build surprisingly solid software if, instead of immediately asking for new shiny features, they first ask questions, explore trade-offs, and get the model's opinion on design decisions..

LLMs are powerful tools in the hands of people who know they don't know everything. But in the hands of people who think they always know the best way, they can be much less useful (I'd say even dangerous)

GorbachevyChase · 2026-03-14T16:28:59 1773505739

I appreciate this sober take. If you hired a remote developer and the only thing you said to that person was “build a program that does this. Make no mistakes” would you expect that to be successful? Are you certain you would get what you wanted?

AstroBen · 2026-03-14T16:38:29 1773506309

Any competent developer there is going to push back and get the needed information out of you.

LLMs don't know when you're under-specifying the problem.

GorbachevyChase · 2026-03-14T20:40:55 1773520855

That’s interesting because that is one feature of Claude code that I like. Given an overly broad problem statement. It does go into a planning loop where it seeks clarifying questions. I think this probably has something more to do with the harness than the model, but you see what I mean. From a user perspective that distinction doesn’t really matter.

flux3125 · 2026-03-03T19:57:08 1772567828

According to science video thumbnails on YT, nothing should be possible

riffraff · 2026-03-04T09:00:59 1772614859

And even if it was, you wouldn't believe it anyway