Forks don't have to be hostile. A perfectly reasonable way to react to an overwhelmed maintainer is just to do a friendly fork. Keep the original name, attribution, git history etc, update the README and start acting as a trustworthy lieutenant. You can review stuck PRs and merge them into your own branch, whilst also merging with upstream master. After a while if you seem to be making good calls the original maintainer can do a bulk merge from your branch to bring in many PRs at once, and maybe add you to the repository.
I'm curious why you think the answer would be no. I've had some success with resolving complex merges with GPT 5.4, and it seems obvious enough that AI is a good solution for maintainers who don't have anyone they can trust to take over the project whilst also needing to boost throughput.
Python has a JIT compiling version in GraalPy. If you have pure Python it works well. The problem is, a lot of Python code is just callouts to C++ ML libs these days and the Python/C interop boundary just assumes you're using CPython and requires other runtimes to emulate it.
This seems to be the root of the problem. Nothing stops a reviewer merging some commits of a PR, except a desire to avoid the git CLI tooling (or your IDE's support, or....). The central model used in a lot of companies requires the reviewee to do the final merge, but this has never been how git was meant to be used and it doesn't have to be used that way. The reviewer can also do merges. Merge (of whichever commits) = approval, in that model.
Yes, the root of the problem is the workflow of the company being centered around GitHub instead of Git itself.
This feature helps improve GitHub so it's useful for companies that do this this way.
At our company, only admin users can actually directly git push to main/master. Everything else HAS to be merged via github and pass through the merge queue.
So this stacked PRs feature will be very helpful for us.
Probably a lot of Googlers don't know. It's ancient history, was called google3 even in 2006 when I first joined.
google1 = code written by Larry, Sergey and employee number 1 (Craig). A hacky pile of Python scripts, dumped fairly quickly.
google2 = the first properly engineered C++ codebase. Protobufs etc were in google2. But the build system was some jungle of custom Makefiles, or something like that. I never saw it directly.
google3 = the same code as google2 but with a new custom build system that used Python scripts to generate Makefiles. I suppose it required a new repository so they could port everything over in parallel with code being worked on in google2. P4 was apparently not that great at branches and google3 didn't use them. Later the same syntax for the build files was kept but turned into a new languages called Starlark and the Makefile generator went away in favor of Blaze, which directly interpreted them.
Probably not. "This model is too powerful for the public" can also be interpreted another way, which they've also strongly hinted at - the cost/benefit ratio of the upgrade is negative for the vast majority of all users. Finding vulnerabilities is one of the few cases where it makes sense to use it.
Their writing about the model so far does say this is an issue where, for instance, you can't really use Mythos for interactive coding because it's so slow. You have to give it some work, go home, sleep, come in the next day and then maybe it'll have something for you.
All the AI labs and startups are still losing money hand over fist. Launching Mythos would require it to be priced well above current models, for a much slower product. Would the majority of customers notice the difference in intelligence given the tasks they're setting? If the answer is no, it's not economic to launch.
Really, I'm surprised they've done Mythos. Maybe they just wanted to exploit access to larger contiguous training datacenters than OpenAI, but what these labs need isn't smarter models, it's smaller and cheaper models that users will accept as good enough substitutes (or more advanced model routing, dynamic thinking, etc).
We've had such models before. GPT Pro, Gemini DeepThink. Mostly targeting science advancements as opposed to security research, but still, in a way Mythos is just more of the same.
Bug bounties don't reflect the market impact of the vulnerability though, just the amount needed to incentivize white hats to do research they wouldn't otherwise (or that they would target to other platforms that pay higher bounties). You need to look at market prices for zero days on the black market to get closer.
Bug bounties reflect what companies are willing to pay to find bugs. Mythos would have to be more expensive than that (probably considerably so) to not be worth its cost. If you are saying that finding bugs has significantly more value than reflected by bug bounties, then that strengthens my point.
Well, and worse, Windows was itself a hive of inconsistency. The most obvious example of UI consistency failing as an idea was that Microsoft's own teams didn't care about it at all. People my age always have rose tinted glasses about this. Even the screenshot of Word the author chose is telling because Office rolled its own widget toolkit. No other Windows apps had menus that looked like that, with the stripe down the left hand side, or that kind of redundant menu-duplicating sidebar. They made many other apps that ignored or duplicated core UI paradigms too. Visual Studio, Encarta, Windows Media Player... the list went on and on.
The Windows I remember was in some ways actually less consistent than what we have now. It was common for apps to be themeable, to use weirdly shaped windows, to have very different icon themes or button colors, etc. Every app developer wanted to have a strong brand, which meant not using the default UI choices. And Microsoft's UI guidelines weren't strong enough to generate consistency - even basic things like where the settings window could be found weren't consistent. Sometimes it was Edit > Preferences. Sometimes File > Settings. Sometimes zooming was under View, sometimes under Window.
The big problem with the web and the newer web-derived mobile paradigms is the conflation between theme and widget library, under the name "design system". The native desktop era was relatively good at keeping these concepts separated but the web isn't, the result is a morass of very low effort and crappy widgets that often fail at the subtle details MS/Apple got right. And browsers can't help because every other year designers decide that the basic behaviors of e.g. text fields needs to change in ways that wouldn't be supported by the browser's own widgets.
“Brand” and “branding” is arguably the most important thing -not- mentioned in the article. The commercial incentives to differentiate are powerful enough to kick a lot of UX out of the way.
Now that all we do is “experience” a “journey,” it’s more about the user doing what the app wants instead of the other way around
Very cool. I've been putting together something very similar, although mine only does email and not Slack. Also it uses Codex not Claude Code, and just relies on ordinary UNIX user isolation rather than containers that are created/destroyed for every request. I just issue it with restricted API keys and rely on the fact that most products already allow humans to be 'sandboxed' via ordinary permissions.
I've also (separately) got a tool for local dev that sets up containers and does SSL interception on traffic from the agent, so it could also swap creds and similar.
The reason they're separate is that in a corp environment the expectation is very strongly that an email account = a human. You can't easily provision full employee accounts for AIs, HR doesn't know anything about that :) In my own company I am HR, so that's not a problem.
This is interesting. I haven't used OpenClaw but I set up my own autonomous agent using Codex + ChatGPT Plus + systemd + normal UNIX email and user account infrastructure. And it's been working great! I'm very happy with it. It's been doing all kinds of tasks for me, effectively as an employee of my company.
I haven't seen any issues with memory so far. Using one long rolling context window, a diary and a markdown wiki folder seems sufficient to have it do stuff well. It's early days still and I might still encounter issues as I demand more, but I might just create a second or third bot and treat them as 'specialists' as I would with employees.
I did (using Claude Code) something that sounds very similar to this. It’s a bunch of bootstrapped Unix tools, systemd units, and some markdown files. Two comments:
- I suspect that in this moment, cobbling together your own simple version of a “claw-alike” is far more likely to be productive than a “real” claw. These are still pretty complex systems! And if you don’t have good mental models of what they’re doing under the hood and why, they’re very likely to fail in surprising, infuriating, or downright dangerous ways.
For example, I have implemented my own “sleep” context compaction process and while I’m certain there are objectively better implementations of it than mine… My one is legible to me and therefore I can predict with some accuracy how my productivity tamagotchi will behave day-to-day in a way that I could not if I wasn’t involved in creating it.
(Nb I expect this is a temporary state of affairs while the quality gap between homemade and “professional” just isn’t that big)
- I do use mine as a personal assistant, and
I think there is a lot of potential value in this category for people like me with ADD-style brains. For whatever reason, explaining in some detail how a task should be done is often much easier for me than just doing the task (even if, objectively, there’s equal or higher effort required for the former). It therefore doesn’t do anything I _couldn’t_ do myself. But it does do stuff I _wouldn’t_ do on my own.
Right - I think email is a much better UI than Slack or WhatsApp or Discord for that reason. It forces you to write properly and explain what you want, instead of firing off a quick chat. Writing things down helps you think. And because coding harnesses like Codex are very good at interacting with their UNIX environments but are also kinda slow, email's higher latency expectations are a better fit for the underlying technology.
Two categories: actual useful work for the company, and improving the bot's own infrastructure.
Useful work includes: bug triage, matching up external user bug reports on GitHub to the internal YouTrack, fixing easy looking bugs, working on a redesign of the website. I also want to extend it to handling the quarterly accounting, which is already largely automated with AI but I still need to run the scripts myself, preparing answers to support queries, and more work on bug fixing+features. It has access to the bug tracker, internal git and CI system as if it were an employee and uses all of those quite successfully.
Meta-work has so far included: making a console so I can watch what it's doing when it wakes up, regularly organizing its own notes and home directory, improving the wakeup rhythm, and packaging up its infrastructure to a repeatable install script so I can create more of them. I work with a charity in the UK whose owner has expressed interest in an OpenClaw but I warned him off because of all the horror stories. If this experiment continues to work out I might create some more agents for people like him.
I'm not sure it's super useful for individuals. I haven't felt any great need to treat it as a personal assistant yet. ChatGPT web UI works fine for most day to day stuff in my personal life. It's very much acting like an extra employee would at a software company, not a personal secretary or anything like that.
It sounds like our experience differs because you wanted something more controlled with access to your own personal information like email, etc, whereas I gave "Axiom" (it chose its own name) its own accounts and keep it strictly separated from mine. Also, so far I haven't given it many regular repeating tasks beyond a nightly wakeup to maintain its own home directory. I can imagine that for e.g. the accounting work we'd need to do some meta-work first on a calendar integration so it doesn't forget.
I’m doing this exact same thing in my solo saas company, except with Cursor’s Cloud Agents. I can kick them off from web, slack, linear, or on a scheduled basis, so I’m doing a lot of the same things as you. It’s just prompts on a cron, with access to some tools and skills, but super useful.
reply