More

IceWreck · 2026-04-08T19:52:32 1775677952

> He then open sourced llama and wanted to be the android of llms.

Well the original llama did kick off the era of open source LLMs. Most original open source LLMs were based on the llama architecture. And look where we are now OSS modles are very close to frontier.

It may not have benefitted Meta but it commoditizatised LLMs.

solarkraft · 2026-04-08T20:43:40 1775681020

Hell, most of us are still using llama.cpp for inference in some form

IceWreck · 2026-04-07T19:35:19 1775590519

Didn't OpenAI say something similar about GPT-3? Too dangerous to open source and then afew years later tehy were open sourcing gpt-oss because a bunch of oss labs were competing with their top models.

FeepingCreature · 2026-04-07T20:03:10 1775592190

OpenAI didn't release GPT-2 initially because they were worried it would make it too easy to generate spam. Which it kinda did.

abroszka33 · 2026-04-07T20:23:04 1775593384

OpenAI said that GPT-5 was too dangerous to release... And look where we are now. It's mostly hype.

IceWreck · 2026-04-05T20:03:28 1775419408

It does if you use an inference engine where you can offload some of the experts from VRAM to CPU RAM. That means I can fit a 35 billion param MoE in let's say 12 GB VRAM GPU + 16 gigs of memory.

Yukonv · 2026-04-05T21:38:57 1775425137

With that you are taking a significant performance penalty and become severely I/O bottlenecked. I've been able to stream Qwen3.5-397B-A17B from my M5 Max (12 GB/s SSD Read) using the Flash MoE technique at the brisk pace of 10 tokens per second. As tokens are generated different experts need to be consulted resulting in a lot of I/O churn. So while feasible it's only great for batch jobs not interactive usage.

IceWreck · 2026-04-05T22:45:05 1775429105

> So while feasible it's only great for batch jobs not interactive usage.

I mean yeah true but depends on how big the model is. The example I gave (Qwen 3.5 35BA3B) was fitting a 35B Q4 K_M (say 20 GB in size) model in 12 GB VRAM. With a 4070Ti + high speed 32 GB DDR5 ram you can easily get 700 token/sec prompt processing and 55-60 token/sec generation which is quite fast.

On the other hand if I try to fit a 120B model in 96 GB of DDR5 + the same 12 GB VRAM I get 2-5 token/sec generation.

zozbot234 · 2026-04-05T23:03:13 1775430193

Your 120B model likely has way more active parameters, so it can probably only fit a few shared layers in the VRAM for your dGPU. You might be better off running that model on a unified memory platform, slower VRAM but a lot more of it.

IceWreck · 2026-04-06T21:10:34 1775509834

Yep, I understand I was giving an example to the person I was replying to.

zozbot234 · 2026-04-05T22:39:55 1775428795

10 tok/s is quite fine for chatting, though less so for interaction with agentic workloads. So the technique itself is still worthwhile for running a huge model locally.

IceWreck · 2026-04-04T21:22:44 1775337764

> This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code.

People have been doing that for over a year already? GLM officially recommends plugging into Claude Code https://docs.z.ai/devpack/tool/claude and any model can be plugged into Codex CLI (it's open source and can be set via config file).

girvo · 2026-04-04T21:47:08 1775339228

And while it’s not Opus level, it is incredibly good. I use it basically exclusively (and qwen3.5-plus) on my personal projects.

IceWreck · 2026-03-31T18:08:23 1774980503

> What Google and OpenAi have open sourced is their Agents SDK, a toolkit, not the secret sauce of how their flagship agents are wired under the hood

And how is that any different? Claude Code is a harness, similar to open source ones like Codex, Gemini CLI, OpenCode etc. Their prompts were already public because you could connect it to your own LLM gateway and see everything. The code was transpiled javascript which is trivial to read with LLMs anyways.

IceWreck · 2026-03-21T16:57:50 1774112270

basedpyright has existed for years and now we have pyrefly from meta too. I think ty is also working on one.

IceWreck · 2026-02-26T15:18:02 1772119082

At this point why not make the agents use a restricted subset of python, typescript or lua or something.

Bash has been unchanged for decades but its not a very nice language.

I know pydantic has been experimenting with https://github.com/pydantic/monty (restricted python) and I think Cloudflare and co were experimenting with giving typescript to agents.

kkukshtel · 2026-02-26T16:18:18 1772122698

This is a really interesting idea. I wonder if something like Luau would be a good solution here - it's a typed version of Lua meant for sandboxing (built for Roblox scripting) that has a lot of guardrails on it.

https://luau.org/

simonw · 2026-02-26T16:02:25 1772121745

Being unchanged for decades means that the training data should provide great results even for the smaller models.

fragmede · 2026-02-26T22:47:36 1772146056

It means there's also plenty of bad examples in the training data to learn the wrong lessons from though.

JohnMakin · 2026-02-26T16:56:09 1772124969

They use bash in ways a human never would, and it seems very intuitive for them.

Spivak · 2026-02-26T19:14:25 1772133265

If you present most LLM's with a run_python tool it won't realize that it can access a standard Linux userspace with it even if it's explicitly detailed. But spiritually the same tool called run_shell it will use correctly.

Gotta work with what's in the training data I suppose.

0x457 · 2026-02-26T19:54:20 1772135660

There are a lot of shellscripts holding this world together out there.

fragmede · 2026-02-27T02:18:53 1772158733

don't forget Perl!

wild_egg · 2026-02-26T15:50:08 1772121008

Agents really do not care at all how "nice" a language is. You only need to be picky with language if a human is going to be working with the code. I get the impression that is not the use case here though

bigbadfeline · 2026-02-26T20:04:56 1772136296

> Agents really do not care at all how "nice" a language is.

People do care.

> You only need to be picky with language if a human is going to be working with the code.

Sooner or later humans will have to work with the code - if only for their own self-preservation.

> I get the impression that is not the use case here though

If that's not the use case, there's no legitimate use case at all.

wild_egg · 2026-02-27T01:16:22 1772154982

I might have misinterpreted what the submitted link is offering then. I thought this was some "secure" sandboxing thing where agents can write ephemeral scripts as tool calls to take arbitrary actions. No human will be looking at that. It's not checked into a repo and is not maintained.

One-off throwaway scripts can be written in literally anything. It does not matter.

fragmede · 2026-02-26T23:52:02 1772149922

> Sooner or later humans will have to work with the code

We want that to be true, but it's starting to look like it might not be.

bandrami · 2026-02-27T04:50:33 1772167833

Humans will eventually have to work with the code, generally at 3am with alarms going off and three tiers of bosses yelling at them on the phone

westurner · 2026-02-26T20:54:23 1772139263

TIL about Monty. A number of people have tried to sandbox [python,] using python and user space; but ultimately they've all concluded that you can't sandbox python with python.

Virtual Machines are a better workload isolation boundary than Containers are a better workload isolation boundary than bubblewrap and a WASM runtime.

eWASM has costed opcodes; https://news.ycombinator.com/item?id=46825763

From "Show HN: CSL-Core – Formally Verified Neuro-Symbolic Safety Engine for AI" (2026) https://news.ycombinator.com/item?id=46963924 :

> Should a (formally verified) policy engine run within the same WASM runtime, or should it be enforced by the WASM runtime, or by the VM or Container that the WASM runtime runs within?

> "Show HN: Amla Sandbox – WASM bash shell sandbox for AI agents" (2026) https://news.ycombinator.com/item?id=46825026 re: eWASM and costed opcodes for agent efficiency

> How do these userspace policies compare to MAC and DAC implementations like SELinux AVC, AppArmor, Systemd SyscallFilter, and seccomp with containers for example?

> [ containers/bubblewrap#sandboxing , cloudflare/workerd, wasmtime-mte, ]

"Microsandbox: Virtual Machines that feel and perform like containers" https://news.ycombinator.com/item?id=44137501

microsandbox/microsandbox: https://github.com/microsandbox/microsandbox :

> opensource self-hosted sandboxes for ai agents

resonious · 2026-02-26T21:06:22 1772139982

I'll add that agents (CC/Codex) very often screw up escaping/quoting with their bash scripts and waste tokens figuring out what happened. It's worse when it's a script they save and re use because it's often a code injection vulnerability.

fragmede · 2026-02-26T22:46:21 1772145981

I want them to be better at it, but given how hard it is for me as a human to get it right (which is to say, I get it wrong a lot, especially handling new lines in filenames, or filenames that start with --) I find it hard to fault them too much.

Bolwin · 2026-02-26T17:29:07 1772126947

I've had LLMs write some pretty complex powershell on the fly. Still a shell language but a lot nicer.

Ideally something like nushell but they don't know that well

inetknght · 2026-02-26T16:12:52 1772122372

Bash is ubiquitous and is not going away any time soon. Nothing is stopping you from doing the same thing with your favorite language.

andrewingram · 2026-02-26T17:10:14 1772125814

just-bash comes with Python installed, so in a way that's what this has done. I've used this for some prototypes with AI tools (via bash-tool), can't really productionise it in our current setup, but it worked very well and was undeniably pretty cool.

sheept · 2026-02-26T16:10:32 1772122232

I feel like Deno would be perfect for this because it already has a permissions model enforced by the runtime

Leynos · 2026-02-26T20:50:14 1772139014

Codex has a JS REPL built in now. And pydantic have a minimal version of Python called Monty.

tosh · 2026-02-26T16:42:00 1772124120

At least for me codex seems to write way more python than bash for general purpose stuff

jauntywundrkind · 2026-02-26T20:09:58 1772136598

Agreed! Very notable codex behavior to prefer python for scripting purposes.

I keep telling myself to make a good zx skills or agents.md. I really like zx ergonomics & it's output when it shells out is friendly.

Top comments are lua. I respect it, and those look like neat tools. But please, not what I want to look at. It would be interesting to see how Lua fairs for scripting purposes though; I haven't done enough io to know what that would look like. Does it assume some uv wrapper too?

pbowyer · 2026-02-26T20:04:47 1772136287

I came across a coding harness using Lua as its control plane yesterday: https://github.com/hsaliak/std_slop/blob/main/docs/lua_integ...

> std::slop is a persistent, SQLite-driven C++ CLI agent. It remembers your work through per-session ledgers, providing long-term recall, structured state management. std::slop features built-in Git integration. It's goal is to be an agent for which the context and its use fully transparent and configurable.

IceWreck · 2026-02-22T18:48:09 1771786089

I've been using https://github.com/sipeed/picoclaw

IceWreck · 2025-12-27T10:33:23 1766831603

Blackberry OS 10 was also running QNX under the hook afaik.

blumenkraft · 2025-12-27T16:29:47 1766852987

And it was awesome! Very responsive.

IceWreck · 2025-12-26T01:02:44 1766710964

This is exactly what Google did with Windsurf and similar to what Meta did with Scale AI. Seems like a rising trend,

njuhhktlrl · 2025-12-26T10:00:16 1766743216

Remember ex3dfx.com setup by former employees?

This is exactly what nvidia tried to do with 3dfx 25 years ago. They have experience of screwing people over!