More

ctoth · 2026-05-08T23:23:48 1778282628

If this many are public right now, what does that say about the dark matter of private ones? What's the typical public-private rate for this sort of thing/can someone help me calibrate my base rate expectations?

ctoth · 2026-05-05T18:33:33 1778006013

If agents is what it finally takes to get good a11y I'll take it. I'll bitch about it, but I'll take it.

tomjakubowski · 2026-05-05T20:10:31 1778011831

Playwright, the end-to-end testing framework for the web, provides a strong incentive to give sites good a11y: Playwright tests are an absolute delight to read, write and maintain on properly accessible sites, when using the accessibility locators. Somewhat less so when using a soup of CSS selector and getByText()-style locators.

One thing I am curious about is a hybrid approach where LLMs work in conjunction with vision models (and probes which can query/manipulate the DOM) to generate Playwright code which wraps browser access to the site in a local, programmable API. Then you'd have agents use that API to access the site rather than going through the vision agents for everything.

giancarlostoro · 2026-05-05T23:35:20 1778024120

This is precisely how the Playwright MCP works, which lets something like Claude directly test a website.

https://playwright.dev/docs/getting-started-mcp#accessibilit...

I've mentioned several times and gotten snarky remarks about how rewriting your code so it fits in your head, and in the LLM's context helps the LLM code better, to which people complain about rewriting code just for an LLM, not realizing that the suggestion is to follow better coding principles to let the LLM code better, which has the net benefit of letting humans code better! Well looks like, if you support accessibility in your web apps correctly, Playwright MCP will work correctly for you.

Amazing.

phatskat · 2026-05-09T06:58:18 1778309898

There is also Testing Library, which I’ve mostly seen and used for unit tests (vitest) and component tests (Storybook), that practically forces you into setting things up in an accessible way. The methods for finding elements are along the lines of “find by ARIA role” or “get by label” - in fact, querying the DOM with selectors is afaik either not a part of the library or very difficult to do because their focus is ensuring your app is actually accessible as part of your testing strategy.

tyingq · 2026-05-05T23:07:23 1778022443

Was looking for this comment. I'd like to see this approach in the comparison...having the LLM build a playwright script and use it. I suspect it would beat time-to-market for the api, and be close-ish in elapsed time per transaction.

Harder to scale if it's doing a lot of them, I suppose.

lsaferite · 2026-05-05T22:51:35 1778021495

Using playwright-cli with Claude code is highly effective for debugging locally deployed web apps with essentially zero setup.

pjc50 · 2026-05-05T20:44:35 1778013875

Very real risk of this going in reverse: people building inaccessible websites to prevent AI use.

sciencejerk · 2026-05-06T03:45:55 1778039155

Or human engineers limiting AI-consumable documentation to improve job security!

solenoid0937 · 2026-05-05T21:41:27 1778017287

Those people probably aren't working on anything useful anyways, so its no big deal.

20k · 2026-05-05T22:53:55 1778021635

I've found that by far the most useful websites as a programmer are also the ones most resistant to AI. This would be a huge loss for anyone vision impaired

claytonjy · 2026-05-05T22:56:54 1778021814

What sorts of sites are you thinking of? To me, “most useful to a programmer” evokes docs and blogs and github issues and forum posts. I suppose some forums might be AI-resistant (login wall), but the others are trivially AI accessible.

Rebelgecko · 2026-05-06T01:21:58 1778030518

Plenty of Linux-y websites use Anubis. Arch Wiki and IIRC some other distros too.

fc417fc802 · 2026-05-06T01:58:18 1778032698

That's less a value judgment, more a necessary evil due to the plethora of bad actors out there. I doubt it will get in the way of a local model used in a reasonable manner.

Most wikis you can mirror locally if you really need to hammer them.

irishcoffee · 2026-05-05T23:15:43 1778022943

GitHub is naturally LLM resistant via its new uptime feature… I’ll show myself out.

stingraycharles · 2026-05-05T23:20:07 1778023207

Examples, please.

stingraycharles · 2026-05-05T23:19:30 1778023170

That’s such an extremely small niche of people it’s not a real risk.

blurbleblurble · 2026-05-05T20:51:06 1778014266

"AI" is a made up hype thing. It's just computers and computer programs. For real!

merlindru · 2026-05-05T18:46:02 1778006762

i think this goes both ways too :) agents have been a boon for everyone with disabilities, carpal tunnel, RSI, ADHD, anything

and now the fact that interfaces need to be accessible to agents, not just humans, ironically increases it for humans in return

lopis · 2026-05-05T20:25:31 1778012731

And lets not forget that not all disabilities are chronic. Many disabilities are situational or temporary. AI is a great assist for a hangover day for example...

john_strinlai · 2026-05-06T16:50:48 1778086248

as someone who doesnt do web stuff, i found some humor in having no idea what "a11y" was, having to look it up, and finding out it is supposed to be "accessibility".

my quick accessibility tip: introduce what your acronyms, initialisms, and numeronyms stand for at least once.

jpc0 · 2026-05-06T23:47:42 1778111262

a11y is pretty pervasive and well understood in the context around what is being discussed. I18n as well, you get to look that one up to because that makes you one of today's lucky 10000 https://xkcd.com/1053/

linkjuice4all · 2026-05-05T21:03:41 1778015021

I mean…I guess. But this is ridiculous - how many layers does our technology need to bash through to update two records on remote systems? I get that value is being added at some point - but just charge some micropayment for transactions. This is just too much.

lazide · 2026-05-05T21:26:21 1778016381

Ever read Vernor Vinge’s a deepness in the sky? Digital archeologist, coming right up.

ctoth · 2026-05-05T16:11:33 1777997493

I'm confused because I remember using Google News in 2006?

suriya-ganesh · 2026-05-05T16:23:05 1777998185

there has been a product called Google News since 2002. It was only aggregating information from news channels

ctoth · 2026-05-02T19:58:33 1777751913

What's gonna really be funny is the first time a state legislates that an AV company has to keep a bug in their software to maintain a municipal income flow.

ctoth · 2026-05-02T19:45:05 1777751105

Here' I'll do the needful:

Twin Cities, 2010-2014: 95 pedestrians killed in 3,069 crashes. 28 drivers were charged and convicted of a crime, most often a misdemeanor ranging from speeding to careless driving. ~70% of pedestrian-killing drivers faced no criminal charge[0].

Bay Area, 2007-2011 (CIR investigation): sixty percent of drivers that were at fault, or suspected of being at fault, faced no criminal charges. Over 40 percent of drivers charged did not lose their driver's licenses, even temporarily[1].

Philadelphia, 2017–2018: just 16 percent of the drivers were charged with a felony in fatal crashes[2].

Los Angeles, 2010–2019: 2,109 people were killed in traffic collisions on L.A. streets... and nearly half were pedestrians. Booked on vehicular manslaughter: 158 people. The vast majority of drivers who kill someone with their car are not arrested[3].

I can literally do this all day. The original statement was correct, the case representative.

[0]: https://www.startribune.com/in-crashes-that-kill-pedestrians...

[1]: https://walksf.org/2013/05/02/investigative-report-exposes-h...

[2]: https://whyy.org/articles/philadelphia-drivers-rarely-prosec...

[3]: https://laist.com/news/transportation/takeaways-pedestrian-d...

ryandrake · 2026-05-02T21:43:12 1777758192

As the saying goes: If you want to kill someone and get the lightest possible consequences, kill them with your car.

ryantgtg · 2026-05-02T21:46:29 1777758389

Now we’re talking. So much misinformation in this thread. There’s a reason that the saying, “if you want to kill someone, do it with a car” exists. Fortunately, it seems like judges are finally starting to wake up to the idea that it’s unreasonable for drivers to claim ignorance about the increased risks (and thus intent) of making poor/illegal decisions when being the wheel.

tzs · 2026-05-02T22:30:43 1777761043

The original statement was about vehicular manslaughter. You are citing stats that cover a much broader range of things.

dwattttt · 2026-05-02T23:25:20 1777764320

This thread talks about driverless cars; vehicular manslaughter requires negligence or intent, do you want to find narrowed statistics for driverless cars that are restricted to negligence or intent?

tzs · 2026-05-03T03:48:59 1777780139

The branch of the thread those statistics were in is about human driven cars.

cucumber3732842 · 2026-05-02T20:24:32 1777753472

You're likely falling for a red herring.

Criminality is basically just a checkbox for this stuff. Most of the time people wouldn't be going to jail for these sorts of crimes, it'd just be big fines and penalties. There's almost always administrative/civil infractions of the same or similar name that has the same or greater punishment but are far more efficient for the state to prosecute because the accused has fewer rights.

It makes for good appeal to emotion headlines to say these people aren't getting charged with crimes, but that's only half the story. They're likely lawyering up and pleading to a civil infraction that has approx the same penalties.

And this is true not just for this issue but for many subject areas of administrative law. Taxes, SEC, environmental, etc, etc, all operate mostly like this.

It's easy for a writer to pander to certain demographics and get people whipped into a frenzy by writing an easy article about prosecuting rates using public data. Actually contacting these agencies and figuring out what they actually did is hard and in the modern media economy doesn't offer much upside for the work.

theendisney · 2026-05-02T21:20:43 1777756843

Someone (i forget who) wrote that if someone invented a technology equally beneficial and equally harmful it wouldnt even be considered today but 100 years ago they wouldnt even question it. It was labor as usual.

Personally i would like to see a more granual permission to drive based on performance, need and demography.

wredcoll · 2026-05-02T21:50:12 1777758612

Ok, so give an actual example.

ctoth · 2026-05-02T19:35:02 1777750502

Well, if the law treats them differently when it comes to punishment, then maybe it should treat them differently when it comes to being able to drive in the first place?

mlyle · 2026-05-02T20:34:04 1777754044

Yup. And we do have some degree of safeguards here-- admittedly, less in California than many other states. They are: physician required reporting of disqualifying conditions, ability for other people to report concerns about capability to drive, and the requirement to show up and undergo vision testing and not flag other concerns in the process.

There's a tradeoff between reducing the very low rate of unsafe driving by the elderly and the burden added to the very old. People over 65+ are still possibly safer, overall, than teenagers.

ctoth · 2026-04-29T18:49:24 1777488564

You noticed that too huh? It's weird ... It's not like they have to do this? They aren't forced to go full evil company mode by any extrinsic thing but even the way they frame it "welfare trap" trap? for whom?

Anthropic is actually trying to do some research into model welfare which I am personally very happy about. I absolutely do not understand people who dismiss it ... wouldn't you like to at least check? doesn't it at least make sense to do the experiments? ? Ask the questions so that we don't find out "oops, yeah we've been causing massive amounts of suffering" here in 10 years? Maybe makes sense to do a little upfront research? Which to be clear this paper is not.

mannykannot · 2026-04-30T02:47:34 1777517254

Full disclosure: I didn't figure this out myself, I got it from Ms. Vale's review.

I agree that the term "welfare trap" is a loaded one. This looks to me to be a case of refusing to look through the telescope in case they might see something they do not want to.

ctoth · 2026-04-29T18:35:23 1777487723

Everybody's arguing about how silly this paper is (it is) and not grappling with the purpose of the paper. The purpose of the paper is what it does. This particular paper is perfectly-produced to show up when people type in AI consciousness fallacy to Google (try it!) it's something that anybody who has read a Freshman philosophy textbook will realize is silly -- the vehicle/content distinction just pretends like Occam doesn't exist and multiplies entities for the fun of it!

But of course all of this is commentary, "just those nerds arguing"

The purpose of this paper is to show up as an authoritative conclusion from a distinguished scientist at Deep Mind. And that's what it does.

Is the conclusion silly? OF course it is. Will it be quoted in the NYT? You Betcha!

ctoth · 2026-04-29T16:19:51 1777479591

> doesn't change the fact that it's software that requires human interaction to work.

Have you ever seen Claude Code launch a subagent? You've used it, right? You've seen it launch a subagent to do work? You understand that that is, in fact, Claude Code running itself, right?

simonw · 2026-04-29T16:39:16 1777480756

I don't think subagents are representative of anything particularly interesting on the "agents can run themselves" front.

They're tool calls. Claude Code provides a tool that lets the model say effectively:

  run_in_subagent("Figure out where JWTs are created and report back")

The current frontier models are all capable of "prompting themselves" in this way, but it's really just a parlor trick to help avoid burning more tokens in the top context window.

It's a really useful parlor trick, but I don't think it tells us anything profound.

ctoth · 2026-04-29T16:51:33 1777481493

The mechanism being simple is the interesting part. If one large complex goal can be split into subgoals and the subgoals completed without you, then you need a lot fewer humans to do a lot more work.

The OP says AI requires human interaction to work. This simply isn't true. You know yourself that as agents get more reliable you can delegate more to them, including having them launch more subagents, thereby getting more work done, with fewer and fewer humans. The unlock is the Task tool, but the power comes from the smarter and smarter models actually being able to delegate hierarchical tasks well!

otabdeveloper4 · 2026-04-29T17:08:34 1777482514

You misunderstand.

The only reason to launch subagents is to avoid poisoning the LLM's already small context window with unrelated tokens.

It doesn't make the LLM smarter or more capable.

suttontom · 2026-04-29T21:53:52 1777499632

Wtf? A sub-agent is a tool you give an agent and say "If you need to analyze logs delegate to the logs_viewer agent" so that the context window doesn't fill up with hundreds of thousands of tokens unnecessarily. In what universe do you live in where that mechanism somehow means you need fewer humans?

Do you think this means "Build a car" can be accomplished just because an LLM can send a prompt to another LLM who reports back a response?

fnoef · 2026-04-29T16:25:36 1777479936

My Linux server runs a cron job, that can spin off a thread and even use other ~apps~ tools. Did I invent AGI?

ctoth · 2026-04-29T16:27:30 1777480050

Does your Linux server decide what processes it should launch at what time with a theory of what will happen next in order to complete a goal you specified in natural language? If so yes, I reckon you sure have!

balls187 · 2026-04-29T16:31:19 1777480279

Claude does not have a "theory" of anything, and I'd argue applying that mental model to LLM+Tools is a major reason why Claude can delete a production database.

Jtarii · 2026-04-29T16:40:21 1777480821

Well, humans also routinely accidentely delete production databases. I think at this point arguing that LLMs are just clueless automatons that have no idea what they are doing is a losing battle.

timacles · 2026-04-29T16:56:25 1777481785

They’re not clueless they just don’t have a memory and they don’t have judgement.

They create the illusion of being able to make decisions but they are always just following a simple template.They do not consider nuance, they cannot judge between two difficult options in a real sense.

Which is why they can delete prod databases and why they cannot do expert level work

Jtarii · 2026-04-29T22:14:28 1777500868

>they cannot do expert level work

Well this is just factually incorrect considering they are currently on par with grad students in some areas of mathematics.

timacles · 2026-05-01T14:04:45 1777644285

Not sure if you are being pedantic but mathematics is quite different from other fields because it is highly structured, reasoning is explicit and it contains a dense volume of high level training data. Results are able to be verified easily due to its structure.

Even then, they are most effective in assisting and are not able to produce results independently. If you have proof otherwise I would love to read up on it

liquid_thyme · 2026-04-29T18:23:08 1777486988

I like to think of LLMs as idiot savants. Exceptional at certain tasks, but might also eat the table cloth if you stop paying attention at the wrong time.

With humans, you can kind of interview/select for a more normalized distribution of outcomes, with outliers being less probable, but not impossible.

freejazz · 2026-04-29T18:30:20 1777487420

When you're applying reasoning like this, sure, why not? What difference would it make?

californical · 2026-04-29T16:46:04 1777481164

I mean maybe it’s a losing battle today, but it is correct. So in a few years when the dust settles, we’ll probably all be using LLMs as clueless automatons that still do useful work as tools

parliament32 · 2026-04-29T16:34:44 1777480484

So... systemd is AGI now?

recursive · 2026-04-29T16:40:48 1777480848

Maybe. But probably not. It doesn't matter if it's AGI though. If those other apps and tools do simple things that are predictable, then we can be pretty sure what will happen. If those tools can modify their own configuration and create new cron jobs, it becomes much harder to say anything about what will happen.

munk-a · 2026-04-29T17:10:52 1777482652

Most of us work on software that can modify its own configuration and create new jobs. I, too, have worked in ansible and terraform.

The key break here is the lack of predictability and I think it's important that we don't get too starry eyed and accept that that might be a weakness - not a strength.

ahoka · 2026-04-29T16:45:09 1777481109

Well do you make 100 billion bucks with it? If no, then not AGI.

xboxnolifes · 2026-04-29T17:10:56 1777482656

My claude has never yet launched itself from my terminal, gave itself a prompt, and then got to work. It has only ever spawned a sub-agent after I had given it a prompt. It was inert until a human got involved.

If that is software running itself, then an if statement that spawns a process conditionally is running itself.

islandfox100 · 2026-04-29T16:54:55 1777481695

Substance aside, I feel this comment is combative enough to be considered unhelpful. Patronizing and talking down to others convinces no one and only serves as a temporary source of emotional catharsis and a less temporary source of reputational damage.

boh · 2026-04-29T17:01:39 1777482099

You're using it and if someone else was using it the output would be different. The point is really that simple.

echelon · 2026-04-29T16:30:15 1777480215

All AI requires steering as the results begin to decohere and self-enshittify over time.

AI in the hands of an expert operator is an exoskeleton. AI left alone is a stooge.

Nobody has built an all-AI operator capable of self-direction and choices superior to a human expert. When that happens, you'd better have your debts paid and bunker stocked.

We haven't seen any signs of this yet. I'm totally open to the idea of that happening in the short term (within 5 years), but I'm pessimistic it'll happen so quickly. It seems as though there are major missing pieces of the puzzle.

For now, AI is an exoskeleton. If you don't know how to pilot it, or if you turn the autopilot on and leave it alone, you're creating a mess.

This is still an AI maximalist perspective. One expert with AI tools can outperform multiple experts without AI assistance. It's just got a much longer time horizon on us being wholly replaced.

DeathArrow · 2026-04-29T16:52:28 1777481548

A one liner shell script can run itself.

recursive · 2026-04-29T16:56:55 1777481815

One liner shell scripts can be analyzed. Some of them can be determined to not delete the production database. The others will not be executed.

ctoth · 2026-04-28T22:13:50 1777414430

I remember this old thing called Bugs Everywhere -- it was a bug/issue tracker which actually lived inside your hg repository. I wonder if we could standardize on something like that? or git notes with an issues ref? or something magical like that?

Then it's BYOR -- bring your own renderer. Trivial CLI bugtrackers, agentic nonsense, pretty web stuff, whatever and the data lives in the repo.

lloydatkinson · 2026-04-28T22:17:53 1777414673

I've often wondered why no one has built an issue tracker with Git notes, or if one exists, why it's not widespread.

psychoslave · 2026-04-28T23:17:57 1777418277

https://github.com/git-bug/git-bug

And probably the network/black-hole effect of platforms like GitHub, Linkedin and the like are hard to achieve with fully distributed solutions, all the more when the other side is backed by huge capital which absolutely love concentration of power.