Hacker Newsnew | past | comments | ask | show | jobs | submit | sarchertech's commentslogin

Who is the other frontier lab other than Anthropic, OpenAI, and Google? I thought they were ahead of everyone else.

Folks who make Deepseek, Qwen, GLM, MiniMax, Kimi and MiMo.

They're at the frontier of last year. They compete with Opus 4.5. They don't yet compete with current frontier models.

They'll presumably catch up, there is no monopoly on talent held by the US. And, that's more true than ever now that the US is actively hostile to immigrants. Scientists who might have come to the US three years ago have little reason to do so now.


> Scientists who might have come to the US three years ago have little reason to do so now.

Been saying that about EU and China for decades now.

Yet the top European and Chinese still come to the US. Even in April 2026.


Nit: scientists have the same reasons to do so now, the same as ever. They just have additional reasons to not do so.

But even that distinction is only temporary, since we're determined to piss away any remaining research lead that draws people in.

Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".


> Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".

Won't be enough to undo the damage. The US would have to do a full about face, prosecute crimes of the current administration and enact serious core reforms to make it impossible for things to drastically change again in 4 years. Also known as, never going to happen because even the current opposition party doesn't actually want structural change. The world has seen how bad the US can get from a single election, and that isn't changing any time soon.


It's kind of hard to say this unless you go out of your way - the scaffolding for interacting with the raw model is a lot better now for many tasks. Is it that 4.7 is so much better than 4.5 or claude 1.119 is so much tuned to squeeze utility out of the LLM despite the hallucinations and lack of self awareness etc. Certainly the current products are great, but I think it's hard to separate the two things, the raw model and the agent workflow constraining the model towards utility.

You can use Claude Code with other models, so one could test that theory. https://openrouter.ai/docs/guides/coding-agents/claude-code-...

I am using Claude Code with GLM, MiniMax, Kimi and MiMo.

Since Gemini 3.1 Pro is considered to be at frontier and GLM 5.1 does better than it in coding benchmarks it would be fair to say GLM 5.1 is a frontier model.

Yeah I thought all of those were generally acknowledged to be a little behind the big 3.

S&P 500 average return over the last 5, 10, 50, and 100 years was higher than that.

So what have you released?

10x means you could have built something that would have taken 4 or 5 years in the time you've had since Opus 4.5 came out.

Where's your operating system, game engine, new programming language, or complex SaaS app?


> I apply the Herbie Hancock philosophy when defining good code. When once asked what is Jazz music, Herbie responded with, "I can't describe it in words, but I know it when I hear it."

That’s the problem. If we had an objective measure of good code, we could just use that instead of code reviews, style guides, and all the other things we do to maintain code quality.

> I truly believe that most competent developers (however one defines competent) would be utterly appalled at the quality of the human-written code on some of the services they frequently use.

Not if you have more than a few years of experience.

But what your point is missing is the reason that software keeps working in the fist, or stays in a good enough state that development doesn’t grind to a halt.

There are people working on those code bases who are constantly at war with the crappy code. At every place I’ve worked over my career, there have been people quietly and not so quietly chipping away at the horrors. My concern is that with AI those people will be overwhelmed.

They can use AI too, but in my experience, the tactical tornadoes get more of a speed boost than the people who care about maintainability.


I had a long reply to your comment, then decide it was not truly worth reading. However, I do have one question remaining:

> the tactical tornadoes get more of a speed boost than the people who care about maintainability.

Why are these not the same people? In my job, I am handed a shovel. Whatever grave I dig, I must lay in. Is that not common? Seriously, I am not being factious. I've had the same job for almost a decade.


That’s because you’ve been there a decade. It’s very common for people to skip jobs every 2 years so that they never end up seeing the long term consequences of their actions.

The other common pattern I’ve seen goes something like this.

Product asks Tactical Tornado if they can building something TT says sure it will take 6 weeks. TT doesn’t push back or asks questions, he builds exactly what product asks for in an enormous feature branch.

At the end of 6 weeks he tries to merge it and he gets pushback from one or more of the maintainability people.

Then he tells management that he’s being blocked. The feature is already done and it works. Also the concerns other engineers have can’t be addressed because “those are product requirements”. He’ll revisit it later to improve on it. He never does because he’s onto the next feature.

Here’s the thing. A good engineer would have worked with product to tweak the feature up front so that it’s maintainable, performant etc…

This guy uses product requirements (many that aren’t actually requirements) and deadlines to shove his slop through.

At some companies management will catch on and he’ll get pushed out. At other companies he’ll be praised as a high performer for years.


One person is rigorously checking to see if Claude is actually following the spec and one person isn’t?

One is getting paid by a marketing department program and the other isn't. Remember how much has been spent making LLMs and they have now decided that coding is its money maker. I expect any negative comment on LLM coding to be replied to by at least 2 different puppets or bots.

Then you should expect any positive comment to be replied negatively by a competition's puppet or bot too

Not necessarily; rising tide and all that. When a new scam like this emerges, it behooves all of the grifters to cooperate and not muddy the waters with distrust.

I’m normally very skeptical of conspiracy theories. But saw an AI booster bot responding to a negative AI post I made here.

Someone pointed out to me in the comments that the username had posted long replies to 3 completely different threads in the same minute. That and looking back at its post history confirmed it was a bot.


... or one person has a very strong mental model of what he expects to do, but the LLM has other ideas. FWIW I'm very happy with CC and Opus, but I don't treat it as a subordinate but as a peer; I leave it enough room to express what it thinks is best and guide later as needed. This may not work for all cases.

If you don’t have a very strong mental model for what you are working on Claude can very easily guide in you into building the wrong thing.

For example I’m working on a huge data migration right now. The data has to be migrated correctly. If there are any issues I want to fail fast and loud.

Claude hates that philosophy. No matter how many different ways I add my reasons and instructions to stop it to the context, it will constantly push me towards removing crashes and replacing them with “graceful error handling”.

If I didn’t have a strong idea about what I wanted, I would have let it talk me into building the wrong thing.

Claude has no taste and its opinions are mostly those of the most prolific bloggers. Treating Claude like a peer is a terrible idea unless you are very inexperienced. And even then I don’t know if that’s a good idea.


> Claude has no taste and its opinions are mostly those of the most prolific bloggers.

I often think that LLMs are like a reddit that can talk. The more I use them, the more I find this impression to be true - they have encyclopedic knowledge at a superficial level, the approximate judgement and maturity of a teenager, and the short-term memory of a parakeet. If I ask for something, I get the statistical average opinion of a bunch of goons, unconstrained by context or common sense or taste.

That’s amazing and incredible, and probably more knowledgeable than the median person, but would you outsource your thinking to reddit? If not, then why would you do it with an LLM?


> they have encyclopedic knowledge at a superficial level, the approximate judgement and maturity of a teenager, and the short-term memory of a parakeet. If I ask for something, I get the statistical average opinion of a bunch of goons, unconstrained by context or common sense or taste.

Love this paragraph; it's exactly how I feel about the LLMs. Unless you really know what you are doing, they will produce very sub-optimal code, architecturally speaking. I feel like a strong acumen for proper software architecture is one of the main things that defines the most competent engineers, along with naming things properly. LLMs are a long, long way from having architectural taste


Try asking to review your code as if it were Linus Torvalds. No, really.

I’ve tried that. I’ve experimented with a whole council of 13 personas including many famous developers. It’s definitely different. But it’s hasn’t performed significantly better in my tests.

Holding it wrong.

That’s interesting to hear as for me Claude has been quite good about writing code that fails fast and loud and has specifically called it out more than once. It has also called out code that does not fail early in reviews.

If you add a single space to a prompt, you’ll get a completely different output, so it’s no surprise that feeding entirely different programs into the prompt produces radically different output.

My guess is that there must be something about the language(go) or the domain (a data migration tool that uses Kafka) that triggers this.


You're right, data migration is a specific case where you have a very strong set of constraints.

I, on the other hand, am doing a new UI for an existing system, which is exactly where you want more freedom and experimentation. It's great for that!


Have you created a plan where the requisite is not to bother you with x and y, and to use some predetermined approach? What you describe sometimes happens to me, but it happens less when its part of the spec.

Yes. That’s one of the things included in this.

> No matter how many different ways I add my reasons and instructions to stop it to the context


> it will constantly push me towards removing crashes and replacing them with “graceful error handling”.

Is it generating JS code for that?


No this is a kafka consumer written in go.

Depends on how it’s enforced.

The data we have on bans on underage drinking and smoking show that they work. Some kids will still smoke and drink, but the number is reduced, drunk driving accidents go down, and eventually fewer adults abuse alcohol and smoke cigarettes.

The myth about age limits making it forbidden and attracting more kids to do it is just that it’s a myth. Spend some time looking at the studies. They almost universally show that age limits on drinking and smoking are harm reducing.


There are a few differences. For one, it's much easier to regulate the sale of alcohol and tobacco, the level of friction is much higher and usually involves an in-person interaction with an adult. Visiting some dodgy website or downloading a VPN is much easier.

Second, the peer pressure to drink/smoke has never been as strong as the network effect of social media. Almost all 15-year-olds are on some form of social media, I don't think you can reasonably expect they will suddenly stop wanting to socialise outside school. Their entire identities are built around their online presence; that was never the case with smoking or drinking, at least not on this scale.

I'm sure it will have some effect, but kids are clever, and they have lots of time, they will find ways to bypass these fairly weak bans. Imo, the only way to do this is to provide an alternative along with the ban, like what the Russians are doing with Max as a replacement for Telegram/WhatsApp, though that's not entirely successful either.


You can’t conjure up a bottle of vodka or a pack of cigarettes out of thin air in your bedroom with a cheap Wi-Fi only Android phone, but you can use that cheap Android phone to access social media.

That’s why I said it depends on the enforcement mechanism. If they require an ID or a credit card then it’s roughly analogous to getting someone to by beer for you.

In a way, it's nice because young people will find way to circumvent the limits and they'll learn "hacking", just like we used to do in the very different internet we grew up with.

A C compiler with an existing C compiler as oracle, existing C compilers in the training set, and a formal spec, is already the easiest possible non-trivial product an agent could build without human review.

You could have it build something that takes fewer lines of code, but you aren’t gonna to find much with that level of specification and guardrails.


That’s part of the issue. But packing a tractor (or car) with electronics and computers does make it inherently harder to work on—even if it’s not locked down.

You need electronics and computers for cost-effective compliance with emissions requirements. Emissions limits have been one of the most positive government policies in my lifetime, saving millions of QALYs.

There's lots of other electronics in most modern vehicles, but the public manufacturer rationales for electronic lockdowns almost always point back to emissions concerns because they're so defensible. How do you separate them?


Perhaps this is naive, but I would imagine that farm equipment is a rounding error in terms of global emissions. Compare the number of tractors to the number of trucks...

I would have expected policy to be pragmatic here, with (relatively) relaxed emissions requirements, since an affordable and reliable food supply is in the national interest? Sounds like that's not the case


Emissions regimes are complicated, but US tractors fall into the much less restrictive off-road category. As a result, they're a disproportionately significant contributor to things like NOx. A long time ago the off-road category was >20%, and I'm sure that percentage has only grown as regulations have forced emissions reductions in onroad vehicles.

> but US tractors fall into the much less restrictive off-road category.

Sometimes. Above 26HP tractors do have to have emissions controls like diesel particulate filters now. Below that they don't.


The vast majority of offroad equipment is not farm equipment but operates in urban environments. As NOx is an air pollution concern, there should be different regimes for rural areas versus urban areas. Construction equipment operating in urban areas is different from a tractor on a farm.

Compare the number of tractors to the number of gas-powered lawnmowers. Which do you think gets better emissions?

I'd imagine it depends what kind of emissions you're measuring? Are we talking air quality or climate change?

Two stroke engines are pretty terrible in terms of unburned hydrocarbons and are disgusting for local air quality, which is why I'm glad they're being phased out in many areas.

I'd expect these tractors with I6 diesel engines to run pretty efficiently. I'd bet that the CO2 emissions from tractors are tiny in comparison from the emissions from trucks, fertiliser, and transporting the food.


Lawnmowers are usually four-stroke, with two-stroke engines reserved for lighter tools like string trimmers and chainsaws.

I would still guess that lawnmowers produce more emissions overall, given that there are so many more mowers than tractors. But they get used less often than tractors, so who knows? Either way, I agree with your thinking process, that the most economical way to reduce overall emissions is to focus on what are actually producing the bulk of emissions.

I don't know how much better cars and trucks can get, and for mowers maybe electric is the answer. Mine is gas-powered, and I know it runs rich. I would love to come inside after mowing and not smell like fuel, so I'm in favor of better emissions controls on mowers.


For tools electric is the answer. To take a chainsaw, the battery needs to be replaced just as often as with refilling the fuel tank. And with newer batteries you might recharge the depleted one as fast as discharging a fresh one. Not sure, just an assumption.

The future for tools is electric 100%.


my brother in Christ, electric chainsaws are garbage, have you ever used one? I tried one out to clear a huge 3 foot wide tree that fell on my property and yeah those things cannot hang with gas powered chainsaws in any way, shape, or form. No one is using electric chainsaws for cutting anything significant.

they may have a place in the distant future but in 2026, aint no way.


Which electric chainsaw did you use?

I haven't used one, but I saw a youtube review from Project Farm. You can check it yourself. https://www.youtube.com/watch?v=u6FM_08066I

The DeWalt chainsaw was similar or better than Stihl, in a different series of tests, including cutting trough 10 inch logs.

There were other brands which would stall or be worse, so it depends on the brand.


I haven't used a chainsaw in a few years, but the last time I did, electric ones with a cord were great. I switched from a proper Stihl chainsaw to a budget electric one with a cord, and despite it being smaller and sort of flimsy, it did cut like crazy, comparable to the gas chainsaw. And it didn't require ear protection, didn't annoy the neighbors and didn't make you smell like a chainsaw for two days.

I like the electric saw for limbing and felling small stuff because it's light and quiet but yeah for anything bigger than like 9" or extended work it's not the tool for the job.

These are regulations, not laws, and can be changed fairly easily. E.g the EPA recently changed the rules requiring NOx sensors and power downs, which were the most failure prone components of the system, while still mandating the actual equipment that scrubs NOx.

There's no particular reason why a mechanical device needs computers for emissions, as the emissions removing components can still be attached and managed via simpler means. All emissions removing components are effectively physical devices, whether you are talking about carbon filters or PCV valves or particulate filters or the urea fluids that are added to the fuel. None of them requires complex software in order to function. There is no reason why you need to buy an official John Deere branded emissions component that is software locked to tractor and costs 10x the price of third party components that do the same thing.

Also, there is a large room to maneuver between "I want a sensor with some circuitry in it" and "the entire tractor is a proprietary computer with locked down parts". The right to repair movement is not about removing tech, but removing unnecessary proprietary tech that is designed to prevent owners of devices from repairing those devices themselves or with third party components.


defeat devices aren't even complicated (they just fake the sensor data to ECU to get what owner needs). Locking down is pointless. Most people are not tuning their cars.

IF we wanted to do it properly, I'd imagine we'd have zero mandatory locks on ECU, just a little closed down black box with sensor installed in relatively tamper-proof way (of course there will always be one, the target is for 90% of people to not bother), logging away and maybe sending check engine light if it detects wrong AFR for too long.

Then you just check that on yearly MOT + any signs of tampering. Then owner is free to tune the engine as they want, provided the exhaust is still within the norms for most of the time.


What would you be accomplishing by trying to control end user behavior like that? As a manufacturer, there are certain standards your machine must meet when it leaves your factory. After that, a whole separate set of standards applies to users--e.g. EPA rules about emissions equipment tampering. As a manufacturer, though, you don't need to attempt enforcement. Leave that to the government, it's their job. Locked down, proprietary hardware and software doesn't ultimately achieve enforcement, it just makes tampering more difficult at the cost of serviceability. This is a dumb trade.

It's to contain the regulation into little box that controls the emission, rather than span it to entire system making it harder to repair. Then the EPA can have its "proof" the vehicle emissions are fine without compromising entire system for repairs.

I think you're asking for something magical, like when politicians go on TV and demand safe cryptosystems with government backdoors. Any time you try to do engineering work to hinder users from using devices they own it's a really bad time. That's the purview of law enforcement, not engineering.

> How do you separate them?

Mandate common interfaces and open hardware. I shouldn't have to buy a $10k dongle to sniff codes. I certainly shouldn't have to buy a different one for each manufacturer.


The legislation has to be robust. No dice if the dongle is generic and $20 like OBD2 in cars, but that on top of that there's a per-manufacturer set of codes that only licensed dealers have access to the software to read those special codes.

The situation today is at least better than it used to be before OBDII. I much prefer using a scanner to get codes then having to count flashing lights. And back then you'd still have to pay a lot for the manufacturer's code reader. The only advantage was the ROM was small enough to disassemble and reflash with new features. I would not want to do that on a car made in 2026.

Most of the codes on a large tractor are j1939. You still want the manufacture database because it often says 'x sensor voltage out of range - check the wiring harness in some not obvious location'

How do you define "electronics" and "computers"? Is a general-purpose computer running Java in the same category as a microcontroller running a tight loop with lookup tables for fuel and spark?

The problem: Once you have a microcontroller running a tight loop with lookup tables for fuel and spark, it's very tempting to make it run a tight loop with lookup tables for fuel, spark, and time since license renewal - and there's no outward difference between the two microcontrollers until one of them stops working. This is where regulations can help: if a manufacturer is afraid of a zillion dollar fine, they won't do that, even if the chance of getting caught is low.

While I agree in principle, we went two or more decades with cars powered by microcontrollers, and I don't recall any manufacturers trying to charge for licenses until more recently. There is something fundamentally different about the economy we are now in, I suspect.

I think the difference is that in the past, companies expected to be punished for obviously evil behavior, but now, they know they can go very far. Toyota got punished for stuck accelerators. Would they get punished for the same thing today? Tesla had stuck accelerators and we all forgot about it.

They're still pushing the boundary today. The Ring Superbowl ad where they announced they're watching you (but they said "your dog") 24/7 apparently got a lot of people to quit Ring, and you know they're crunching the numbers to see if the retention rate is worth the extra surveillance collection.


They charge for the diagnostic systems. Bigly. For example, Mercedes-Benz's Star Diagnostic System (SDS) is necessary for a variety of repairs and diagnostic procedures. There are varying degrees of workarounds and alternatives but none of them work quite right, or for every model/year/variant. It's not just the embedded system, it's also the interface to it. That's where the really ugly rent seeking crops up. And that's precisely why a tractor with no computers is attractive--not because the embedded software might try to ransom itself (although that's a reasonable fear) but because some horrible rent seeking corporate functionary will do their utmost to cheat you (or your mechanic) out of as much money as possible when it comes time to do any maintenance or diagnostic testing. No computers means that little bastard can fuck right off.

I still don't understand what was downvotable about this comment.

Exactly. Electronically controlled unit injectors are expensive--like 10x the price of mechanical ones. They're super cool, they can produce like 10 separate metered injection events per cycle. This is great for efficiency, noise, emissions, etc. But I can rebuild mechanical injectors with a bottle jack pop tester I made from $100 worth of parts and a bench vise. There's no wiring harness, no computer.. If the injector is getting fuel, has decent spray pattern, and is popping at the right pressure I know for certain the fuel system is good. With an electronic common rail system I need some expensive proprietary computer equipment to diagnose it, and there's no way I can build a test bench to rebuild those injectors.

You can't build a test bench to rebuild current OEM's electronic common rail injector systems that rely on expensive proprietary computer equipment, but there's no reason that has to be the case.

With a $20 CAN transceiver, documentation and/or config files from the manufacturer, and a bit of Python or something, you could absolutely bench test those electronic injectors. You might even be able to pick your injection events and adjust the metering, supporting the equipment as it ages. I'd love to see Ursa Ag put in a Megasquirt engine controller [1] or Proteus [2] or similar. You can run TunerStudio on a Raspberry Pi and show it on a touchscreen on the dash.

It's possible to build user-friendly, inexpensive and open engine and vehicle controls. You don't need to have zero electronics to not have locked-down proprietary electronics, you just need to build the electronics in the right way.

[1] https://diyautotune.com/products/ms3357-c?_pos=2&_fid=69f494...

[2] https://rusefi.com/index.html#proteus


Controls are one thing, but there's also the problem of generating 20k psi of oil pressure and some thousands of pounds of continuous common rail fuel pressure to actuate the injector. Compared with older MW, M, P, etc. styles it's a whole different beast. Also, we're talking past each other a little--I'm talking about diesel injectors, you're talking about otto cycle equipment ;)

Surely there’s room for a middle ground. There are plenty of 1990s-era engines that were excellent designs, had no meaningful connectivity to anything except their own ECUs, and could be produced new for not very much money. Some of them were quite modular, too — I know someone who took the drivetrain out of a salvaged Honda Civic and built an entire car (with no resemblance whatsoever to a Civc) around it.

If a tractor with a clean-burning, efficient $7500k engine could be purchased and were designed around the theory that, in 20 years or so, the owner could reasonably quickly replace the entire engine (with a first-party or aftermarket solution), would that be a good solution?

The common tech that has solved these problems nicely (IMO) is network transceivers: SFP and similar modules are built according to multi-source agreements. They contain all kinds of exotic tech, and they are not intended to be serviced at all, but (unless your switch or NIC has an utterly stupid lockout) you can pull it out and replace it with an equivalent part from a different vendor in seconds, and those parts can be unbelievably inexpensive considering what’s in them. (Single-mode bidirectional 1Gbps transceivers are $11 or less, retail, in qty 2. This is INSANE compared the the first time I lit up a 1Gbps SMF link. To be fair, this particular tech may require one to replace both ends if one fails, but if you can spare a second fiber, the fully IEEE-spec-compliant interoperable ones are even less expensive.)


It's not the craziest idea. A tractor is basically just a big hydraulic pump driving a bunch of linear and rotary actuators (commonly called "motors" and "cylinders"). Especially if it's got a hydrostatic transmission. If you design it in such a way that it's relatively easy to adapt different clutches and bell housings, maybe with a little driveshaft and u-joint between the clutch and the pump, you could theoretically accomplish something like this.

However one major sticking point is that (often.. maybe always?) the engine block casting is actually a structural component of the tractor "frame". Unlike e.g. a truck that has its driveline mounted between frame rails, a tractor's "frame" is its driveline . So this might add quite a bit of complexity and cost.


Eh to henerate a decent nozzle takes some precision lazer drilling (e.g.trumpf) or edm drilling (e.g posalux)and some grinding + a quality test bench. Its not that easy having good lowtech solutions either.

Yeah you're definitely gonna want to purchase nozzles. They're extremely precise and manufactured to very high tolerances. I've rebuilt plenty of 30+yr old injectors and haven't yet been unable to find newly manufactured or new old stock nozzles though.

EDIT: I did have some nozzles bored out a little bit once by a shop with EDM equipment. Terrible results, not worth it.


I hear complaints about 401k balances dropping. But I don’t think I’ve heard complaints about it not going fast enough because we haven’t had a long period of slow stock market growth in a long time.

Based on the numbers from article this person says they are writing a prompt every 3 minutes, all day long, every day.

This is just nonsense. The whole thing looks like the fever dream of someone in a severe manic episode. Even the formatting and writing style of blog has a manic feeling. Hard to tell if that’s coming from the user or the AI.

I’d like to know how many users does all this “shipped code” have?

>35 years building event-driven distributed systems.

Also this guy was not building event-driven distributed systems in 1991.


Hi, I'm the original author and I can clarify a few things.

The 543 hours are the agent compute hours, not me at the keyboard. The pipeline runs autonomously, the agents execute in parallel, and the gates verify the output. Most of the prompts are agent-to-agent, not human-to-agent.

On the timeline: I have a BSCS (1995) and MSCS (1997) with a specialty in distributed systems. I actually worked my way through school doing this work so I didn't need loans. Let's call it almost 35 years.

The terminology has evolved but the architecture hasn't changed as much as people think.


> Most of the prompts are agent-to-agent, not human-to-agent.

I can’t even begin to parse any of this if that’s the case.

> Let's call it almost 35 years.

I was hooking up TI-83s to each other when I was 12, so I guess I’ll tell people I’ve been building distributed systems for 30 years.

I’m going to bet that you didn’t have “building event driven distributed systems for 15 years” on your resume in 2006.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: