Hacker Newsnew | past | comments | ask | show | jobs | submit | simonw's commentslogin

That's not at all incompatible with Bluesky having a funded company with a CEO.

The term they use for this is "credible exit" - designing the entire protocol such that if the company itself misbehaves the affected users can leave to a separate instance without losing their relationships or data.


The thing I most want to use this (or some other WASM Linux engine) for is running a coding agent against a virtual operating system directly in my browser.

Claude Code / Codex CLI / etc are all great because they know how to drive Bash and other Linux tools.

The browser is probably the best sandbox we have. Being able to run an agent loop against a WebAssembly Linux would be a very cool trick.

I had a play with v86 a few months ago but didn't quite get to the point where I hooked up the agent to it - here's my WIP: https://tools.simonwillison.net/v86 - it has a text input you can use to send commands to the Linux machine, which is pretty much what you'd need to wire in an agent too.

In that demo try running "cat test.lua" and then "lua test.lua".


> The thing I most want to use this (or some other WASM Linux engine) for is running a coding agent against a virtual operating system directly in my browser.

That exists: https://github.com/container2wasm/container2wasm

Unfortunately I found the performance to be enough of an issue that I did not look much further into it.


Did anyone expect anything different though, when running a full-blown OS in JavaScript?

Simon, this HN post didn't need to be about Gen AI.

This thing is really inescapable those days.


Parallel thread: https://news.ycombinator.com/item?id=47311484#47312829 - "I've always been fascinated by this, but I have never known what it would be useful for."

I should have replied there instead, my mistake.


I don't know man, I didn't see anyone say "this post didn't need to be about <random topic>", HN has just become allergic to LLMs lately.

I'm excited about them and I think discussion on how to combine two exciting technologies are exactly what I'd like to see here.


You haven't been around here in the Blockchain/NFT/Smart Contract dark ages, have you?

Naw man I just signed up.

I chuckled. Everything on earth is recent if you look at it from a cosmic timeframe I guess

To be fair, it really was annoying when everything was blockchain.

Has there ever been any other topic that was not only the subject of the majority of submissions, but also had a subset of users repeatedly butting into completely unrelated discussions to go "b-but what about <thing>? we need to talk about <thing> here too! how can I relate this to <thing>? look at my <thing> product!"?

You can't just roll in to a random post to tell people about your revolutionary new AI agent for the 50th time this week and expect them not to be at least mildly annoyed.


I'm with you, but he wasn't telling us about his agent, he was saying "this is a cool technology and I've been wanting to use it to make a thing". The thing just happened to be LLM-adjacent.

Almost all of his comments "just happen" to be LLM-adjacent. At some point it stops "just happening" and it becomes clear that certain people (or their AI bots) are frequenting discussion spaces for the sole purpose of seeking out opportunities to bring up AI and self-promote.

You are not reading his material i suppose? It’s really one of the better sources for informed takes on llms

I just went and read one of his recent posts at: https://simonwillison.net/2026/Mar/5/chardet/

The entire thing is just quotes and a retelling of events. The closest thing to a "take" I could find is this:

> I have no idea how this one is going to play out. I’m personally leaning towards the idea that the rewrite is legitimate, but the arguments on both sides of this are entirely credible.

Which effectively says nothing. It doesn't add anything the discussion around the topic, informed or not, and the post doesn't seem to serve any purpose beyond existing as an excuse to be linked to and siphon attention away from the original discussion (I wonder if the sponsor banner at the top of the blog could have something to do with that...?)

This seems to be a pattern, at least in recent times. Here's another egregious example: https://simonwillison.net/2026/Feb/21/claws/

Literally just a quote from his fellow member of the "never stops talking about AI" club, Karpathy. No substance, no elaboration, just something someone else said or did pasted on his blog followed by a short agreement. Again, doesn't add anything or serve any real purpose, but was for some reason submitted to HN[1], and I may be misremembering but I believe it had more upvotes/comments than the original[2] at one point.

[1] https://news.ycombinator.com/item?id=47099160

[2] https://news.ycombinator.com/item?id=47096253


I think my coverage of the Mark Pilgrim situation added value in that most people probably aren't aware that Mark Pilgrim removed himself from internet life in 2011, which is relevant to the chardet story.

That second Karpathy example is from my link blog. Here's my post describing how I try to add something new when I write about things on my link blog: https://simonwillison.net/2024/Dec/22/link-blog/

In the case of that Karpathy post I was amplifying the idea that "Claw" is now the generic name for that class of software, which is notable.


Simon has been here since way before LLMs were a thing, and it's fairly obvious (to me, at least) that he's genuinely excited about LLMs, he's not just spamming sales or anything.

Why not leting upvotes do their thing? I enjoyed this comment.

What topics are allowed in your opinion? I very much enjoyed Simon’s comment as it is a use case I also was thinking of.

a bit cute that you interacted with the 1 AI thread. there are other threads!

Check out Jeff Lindsay's Apptron (https://github.com/tractordev/apptron), comes very close to this, and is some great tech all on its own.

It's getting there. Among other things, it's probably the quickest way to author a Linux environment to embed on the web: https://www.youtube.com/watch?v=aGOHvWArOOE

Apptron uses v86 because its fast. Would love it for somebody to add 64-bit support to v86. However, Apptron is not tied to v86. We could add Bochs like c2w or even JSLinux for 64-bit, I just don't think it will be fast enough to be useful for most.

Apptron is built on Wanix, which is sort of like a Plan9-inspired ... micro hypervisor? Looking forward to a future where it ties different environments/OS's together. https://www.youtube.com/watch?v=kGBeT8lwbo0


We are working on exactly this: https://browserpod.io

For a full-stack demo see: https://vitedemo.browserpod.io/

To get an idea of our previous work: https://webvm.io


How’s performance relative to bare metal or hardware virtualization?

I run agents as a separate Linux user. So they can blow up their own home directory, but not mine. I think that's what most people are actually trying to solve with sandboxing.

(I assume this works on Macs too, both being Unixes, roughly speaking :)


Are you describing bolt.new? (Unfortunately, it looks like their open source project is lagging behind https://github.com/stackblitz-labs/bolt.diy)

It's relatively easy to spin up a busybox WASM v86 solution

While this may be a better sandbox, actually having a separate computer dedicated to the task seems like a better solution still and you will get better performance.

Besides, prompt injection or simpler exploits should be addressed first than making a virtual computer in a browser and if you are simulating a whole computer you have a huge performance hit as another trade off.

On the other hand using the browser sandbox that also offers a UI / UX that the foundation models have in their apps would ease their own development time and be an easy win for them.


This is not the technical solution you want, but I think it provides the result that you want: https://github.com/devcontainers

tldr; devcontainers let you completely containerize your development environment. You can run them on Linux natively, or you can run them on rented computers (there are some providers, such as GitHub Codespaces) or you can also run them in a VM (which is what you will be stuck with on a Mac anyways - but reportedly performance is still great).

All CLI dev tools (including things like Neovim) work out of the box, but also many/most GUI IDEs support working with devcontainers (in this case, the GUI is usually not containerized, or at least does not live in the same container. Although on Linux you can do that also with Flatpak. And for instance GitHub Codespaces runs a VsCode fully in the browser for you which is another way to sandbox it on both ends).


This is interesting (and I've seen it mentioned in some editors), but how do I use it? It would be great if it had bubblewrap support, so I don't have to use Docker.

Do you know if there's a cli or something that would make this easier? The GitHub org seems to be more focused on the spec.


Can I PLEASE click on ONE post on the front page of HN without immediately being met by some grifter trying to derail it to promote their AI product?

Please? I'm begging here.


Nobody is promoting a product. Simon is just sharing an experiment he attempted. No products being sold here.

Maybe not, but in the past some here see that the blog is the product that is being promoted here.

Even in this thread alone https://news.ycombinator.com/item?id=47314929 some commenters here are clearly annoyed with the way AI is being shoved in each place where they do not want it.

I don't care, but I can see why many here are getting tired of it.


> CBP’s Automated Commercial Environment (ACE) system can apparently only batch-process 10,000 entry summary lines at a time, and there are over 1.6 billion entry summary lines that need updating. Importers frequently lumped their IEEPA duties together with other duties on the same line, meaning CBP personnel would have to manually untangle the amounts. Processing each individual refund takes about 5 minutes, which across 53 million entries works out to over 4.4 million hours.

Unemployment numbers about to drop like a rock.

44000000 / 2000 hours/year = 2200 jobs for 1 year. *50k/year = $110,000,000

So you’re telling me we could do this all for just a few million dollars more than the price of the three fighter jets recently shot down over Kuwait, and provide good American jobs while doing so? Sounds like a deal.

(In reality it would be more expensive because you would have to source, train, and administer those people plus audit the results afterwards. But the government had said it would be doable before to the courts!)


You should add in time for training.

Right. The alternative is that we reward Dan for his 14 years of volunteer maintenance of a project... by banning him from working on anything similar under a different license for the rest of his life.

The challenge I'm finding with sandboxes like this is evaluating them in comparison to each other.

This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.

What I really need is help figuring out which ones are trustworthy.

I think this needs to take the form of documentation combined with clearly explained and readable automated tests.

Most sandboxes - including sandbox-exec itself - are massively under-documented.

I am going to trust them I need both detailed documentation and proof that they work as advertised.


Thank you for your work - I have sent many of your links to my people.

Your point is totally fair for evaluating security tooling. A few notes -

1. I implemented this in Bash to avoid having an opaque binary in the way.

2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)

3. There are E2E tests validating sandboxing behavior under real agents

4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.

5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt


I love this implementation. Do you find the SBPL deficient in any ways?

Would xcodebuild work in this context? Presumably I'd watch a log (or have an agent) and add permissions until it works?


SBPL is great for filesystem controls and I haven’t hit roadblocks yet. I wish it offered more controls of outbound network requests (ie filtering by domain), but I understand why not.

Yes, Safehouse should work for xcodebuild workloads in the way you described - try to run it, watch for failures, extend the profile, try again. Your agent can do this in a loop by itself - just feed it the repo as there are many integrations that are not enabled by default that will help it.


For anyone reading this later.

I read a little from sandvault and they suggest sandbox-exec doesn't allow recursive sandboxing, so you need to set flags on xcodebuild and swift to not sandbox in addition to the correct SBPL policy.

(I don't think sandvault has a swift/xcode specific policy because they're dumping everything into a sandvault userspace. And it doesn't really concern itself with networking afaict either.)


If you're looking for one better documented and tested, you might like https://github.com/kstenerud/yoloai

I'm having trouble understanding what makes this: "better documented and tested"? Care to elaborate how the testing was done? What are the differences?

So create a 'destroy my computer' test harness and run it whenever you test another wrapper. If it works you'll be fine. If it doesn't you buy a new computer.

I mean yeah, it's Grok. They had to work really hard to get their preferred levels of political bias in there.

Any examples of political bias?

Apart from calling itself "Mecha Hitler" and that time it made every conversation be about "white genocide" in South Africa?

There's also Grokipedia, supposedly make by Grok, where I'd point to section 3.6 ("Controversial topics"): https://arxiv.org/html/2511.09685v1


Both programs have been announced as granting six months, but neither of them have explicitly said that there won't be options to renew for another six months.

I expect they haven't decided that themselves yet and don't want to commit publicly until they've seen how well the program goes.


Even if you’re right, no one should be making a decision of enrolling into those programs because maybe, with zero indication they’ll be renewed again in six months.

You know what they could also do? Stop the programs for new enrolments next month. Or if if they renew them like you said, it could be with new conditions which exclude people currently on them.

There are too many unknowns, and giving these companies the benefit of the doubt that they’ll give more instead of taking more goes counter to everything they showed so far.


Is your argument here that you shouldn't accept the free trial because you might find it useful and then be trapped into paying for more of it later?

No, my argument is that your “but neither of them have explicitly said that there won't be options to renew for another six months” point is not something anyone should realistically be counting on, and is not a valid counter argument to your parent post of “Isn't the Claude one only for a few months?”.

We should be discussing what is factual now, not be making up scenarios which could maybe happen but have zero indication that they will.


I didn't say that I thought they would likely extend it, but I stand by my statement that it's a possibility.

Neither company have expressed that the six month thing is a hard limit.

The fact that OpenAI shipped their version within two weeks of Anthropic's announcement suggests to me that they're competing with each other for credibility with the open source community.

(Obviously if you make decisions based on the assumption that the program will be expanded later you're not acting rationally.)


If I understand correctly, they are literally giving things away for free for a 6 months period and we are complaining that they don't promise it stays free forever?

No, you did not understand correctly. They are not “literally giving things away for free”, they are providing a very conditional free trial, which is a business decision and not anything new. Then a commenter speculated they might extend that program because they didn’t say they won’t and I pointed out it doesn’t make sense to assume they will. No one on this immediate thread made any complaint, we’re discussing the facts of the offering.

I know dozens of people who are in a similar state right now, following the November 2025 moment when Claude Code (and Codex) got really good.

I wouldn't worry about it just yet - this is all very novel, and there's a lot of excitement involved in figuring out what it can do and trying different things.

If you're still addicted to it in three months time I'd start to be concerned.

For the moment though you're building a valuable mental model of how to use it and what it can do. That's not wasted time.


I'm seeing the limits when Claude makes some statements that are extremely wrong but incredibly hard to spot unless you're in the field, recently telling me that "some people say" that rydberg atoms and neutral atoms are different enough to be in different quantum computing categories (they're the same). The stakes are lowering somehow, because I know I can't trust it for anything but fun side-projects. For serious research it's still me and reading papers.

I'm not trying to convert you, just want to share process tips that I see working for me and others. We're using agents, not a chat, because they can do complex work in pursuit of a goal.

1. Make artifacts. If you're doing research into a tech, or a hypothesis, then fire off subagents to explore different parts of the problem space, each reporting back into a doc. Then another agent synthesizes the docs into a conclusion/report.

2. Require citations. "Use these trusted sources. Cite trusted sources for each claim. Cite with enough context that it's clear your citations supports the claim, and refuse to cite if the citation doesn't support the claim."

3. Review. This lets you then fire off a subagent to review the synthesis. It can have its own prompt: look for confirming and disconfirming evidence, don't trust uncited claims. If you find it making conflation mistakes, figure out at what stage and why, and adjust your process to get in front of them.

4. Manage your context. LLM only has a fixed context size ("chat length") and facts & instructions at the front of that tend to be better hewn to than things at the end. Subagents are a way of managing that context to get more from a single run. Artifacts like notebooks or records of subagent output move content outside the context so you can pick up in a new session ("chat") and continue the work.

It's less fun that just having a chat with ChatGPT. I find that I get much better quality results using these techniques. Hope this helps! If you're not interested in doing this (too much like work, and you already have something that works), it's no skin off my nose. All the best!


Thanks for the thoughtful reply! I definitely want to try a more complex setup when I have more time on my hands

Your mental model point is very true. We all had to learn how to google at some point — explaining how to use these tools to someone outside the bubble feels like explaining how googling works to someone. Much of it is intuitive understanding from experience.

My question would be how much we think the processes will change as the models do. Much advice from two years ago is no longer relevant or realistic. Where do we think it will go next?

Does anyone have a really good way to explain to their relatives and friends how using an agent is different from simply using Google? Just saying ‘fundamentally different’ doesn’t go very far; the best I’ve found is sitting down and giving a demonstration.

It’s also difficult to explain the enormous gap between frontier models and the free ones many people are accustomed to using. Is there a tangible comparison to a normie real-life ‘thing’ that anyone has used successfully?


I have no idea how to explain agents to a non-technical audience - there is SO much that can go wrong with them, it's still very much a technical power-user technology in my opinion.

ChatGPT and Claude can both execute code now, so a safe subset is to show people how to upload files there and have them do useful things with the data.


> Much of it is intuitive understanding from experience.

You’re mistaking domain expertise with tool expertise. You can’t teach a non dev how to use an LLM effectively for dev without teaching them to be an experienced dev. Once you have that knowledge, LLMs aren’t that hard to use.


Took me a moment to understand that "Magic Containers" here are a product offered by bunny.net https://bunny.net/magic-containers/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: