Building my dev workspace into an operating system. Not metaphorically — structurally.
10 MCP servers as device drivers (exchange APIs, browser automation, Apple docs, issue tracking).
200+ skills as prose runbooks that compose system calls. Agent-mail for IPC between parallel
agents. A drift detector called "wobble" that scores skill stability using bias/variance analysis.
> The interesting failure mode isn’t just “one bad actor slips through”, it’s provenance: if you want to
> “denounce the tree rooted at a bad actor”, you need to record where a vouch came from (maintainer X,
> imported list Y, date, reason), otherwise revocation turns into manual whack-a-mole.
>
> Keeping the file format minimal is good, but I’d want at least optional provenance in the details field
> (or a sidecar) so you can do bulk revocations and audits.
Pinning exists, but the interesting part is signal quality: macOS gets consistent “urgency” signals (QoS) from a lot of frameworks/apps, so scheduling on heterogeneous cores is less guessy than infer from runtime behavior.
This is the real story buried under the simulation angle. If you can generate
reliable 3D LiDAR from 2D video, every dashcam on earth becomes training data.
Every YouTube driving video, every GoPro clip, every security camera feed.
Waymo's fleet is ~700 cars. The internet has millions of hours of driving
footage. This technique turns the entire internet into a sensor suite. That's a bigger deal than the simulation itself.
"Stack Overflow that reads your codebase" — perfect. But Stack Overflow is
stateless. Agent sessions aren't.
One session's scaffold assumes one pattern. Second session scaffold contradicts it. You reviewed both in isolation. Both looked fine. Neither knows about the other.
Reviewing AI code per-session is like proofreading individual chapters of a novel nobody's reading front to back. Each chapter is fine. The plot makes no sense.
This is a very early research prototype with no other inter-agent communication methods or high-level goal management processes."
The lock file approach (current_tasks/parse_if_statement.txt) prevents two agents from claiming the same task, but it can't prevent convergent wasted work. When all 16 agents hit the same Linux kernel bug, the lock files didn't help — the problem wasn't task collision, it was that the agents couldn't see they were all solving the same downstream failure. The GCC oracle workaround was clever, but it was a human inventing a new harness mid-flight because the coordination primitive wasn't enough.
Similarly, "Claude frequently broke existing functionality implementing new features" isn't a model capability problem — it's an input stability problem. Agent N builds against an interface that agent M just changed. Without gating on whether your inputs have changed since you started, you get phantom regressions
Agent teams in this release is mcp-agent-mail [1] built into
the runtime. Mailbox, task list, file locking — zero config,
just works. I forked agent-mail [2], added heartbeat/presence
tracking, had a PR upstream [3] when agent teams dropped. For
coordinating Claude Code instances within a session, the
built-in version wins on friction alone.
Where it stops: agent teams is session-scoped. I run Claude
Code during the day, hand off to Codex overnight, pick up in
the morning. Different runtimes, async, persistent. Agent
teams dies when you close the terminal — no cross-tool
messaging, no file leases, no audit trail that outlives the
session.
What survives sherlocking is whatever crosses the runtime
boundary. The built-in version will always win inside its own
walls — less friction, zero setup. The cross-tool layer is
where community tooling still has room. Until that gets
absorbed too.
[1] https://github.com/Dicklesworthstone/mcp_agent_mail
[2] https://github.com/anupamchugh/mcp_agent_mail
[3]
https://github.com/Dicklesworthstone/mcp_agent_mail/pull/77
i would like to see a mobile app for this to vibe code on the fly. currently the DIY options are good but clumsy UX wise as they’re workarounds. if i can open worktrees from an ios app it would be great.