A counter-argument here: if a private company knows that its technology may be used for human-not-in-loop targeting/surveillance, and knows that its technology is not yet ready to fulfill that use case without meaningful unintended casualties... does that company have an ethical obligation to contractually delineate its inability to offer that service?
In a version of a trolley problem where you're on a track that will kill innocent people, and you have the opportunity to set up a contract that effectively moves a switch to a track without anyone on it, is it not imperative to flip that switch?
(One might argue that increased reaction times might save service members' lives - but the whole point is that if the autonomous targeting is incorrect, it may just as well lead to increased violence and service member casualties in the aggregate.)
And we're not talking about the ethics board manipulating individual token outputs subtly, which would indeed be a supply chain risk - we're talking about a contractual relationship in which, if a supplier detects use outside of the scope of an agreed contract, it has the contractual right to not provide the service for that novel use, while maintaining support for prior use cases.
The fact that the government would use the threat of supply chain risk to enforce a better contract is unprecedented, and it deteriorates the government's standing as a reliable counterparty in general.
It's an interesting question, but it's mostly irrelevant.
This problem is really difficult to discuss because we are all wrapping the capabilities of these tools into our response framing. These are tools, or weapons. Your hypothetical could just as easily be applied to GBU-39s, a smaller laser guided bomb that's meant to take out, say, a single vehicle in a convoy versus the entire set of vehicles. If you're not confident in what the product is supposed to do, and you've already sold it to the government, you have lied and they are going to come back to you asking some direct questions.
I don't know what sales numbers look like, but from my perspective as a casual reader who's been trying to keep up with Hugo Award nominees of recent years... science fiction may not be trendy on BookTok, but it's far from dead.
Sure, fantasy has "corrupted" the Hugos, but there's plenty of hard science fiction, and science fiction that grapples with societal questions at large. Arkady Martine's Memory Called Empire and Desolation Called Peace, both Hugo winners, are incredibly thoughtful depictions of societies on the verge of disruption from new technology. Ryka Aoki's Light From Uncommon Stars, a 2022 Hugo nominee, while somewhat of a sci-fi-fantasy genre crossover, is both harrowing and exhilarating in its discussion of gender through a speculative lens (content warnings apply).
If you're looking for whether innovative science fiction is being adapted into popular media - the Three Body Problem (2015 Hugo winner) and the Murderbot series (won the Hugo most recently in 2021) are both being adapted. Andy Weir's post-Martian works continue to be hyped, if not quite adapted yet. The fanbase for Tamsyn Muir's Ninth series is rabid in the best possible way - and while ostensibly centered on necromancy, it's remarkably high in sci-fi hardness.
And outside of traditional publishing - democratized writing challenges like https://www.reddit.com/r/HFY/, genre-crossing serial sci-fi like the works of Wildbow, and fanfiction in general (I continue to follow and adore the To The Stars, a Madoka fanfic which juxtaposes magic with a spacefaring future humanity, with masterful worldbuilding) continue to thrive.
Traditional publishing houses may indeed be in crisis, but contrary to the original article's assertion, there is no shortage of ideas.
Also on the TV/movie front science fiction is doing quite well right now.
Apple TV seems to have made producing good scifi series one of their main selling points. Lots of famous scifi series are getting TV adaptations and apple TV is producing wholly new scifi series as well.
Foundation, Murderbot, Silo, For All Mankind (and the upcoming Star City), Severence, Dark Matter, Monarch, Pluribus, and Neuromancer to name some of the current and upcoming series.
And of course if my theory is right I suspect the upcoming Firefly announcement will be that Apple TV is picking them up for a continuation as well.
---------
But also scifi has a lot of other avenues for exploring their ideas now (such as via interactive media/video games). I'd argue some of the best scifi works of the current generation come from interactive media/video games rather than television or movie. Ex: Outer Wilds, The Talos Principle 1&2, Nier, VA-11 Hall-A, Bioshock, Mass Effect, Dead Space, Deus Ex, etc.
Like frankly television and movie are massively expensive and books are way harder to sell now than they ever were (as discoverability and reach are poor) but video games as a more visual medium are easier to sell but at the same time the entry point for making them is an order of magnitude lower than TV or movies. So it's not terribly surprising to see scifi flourish with games where other mediums have found themselves in a slump.
Something I've recently come to appreciate is that Claude, with the context of your codebase and your ORM models and how they connect to your frontend, given read-only access to production databases (perhaps proxied to anonymize client data), and to be able to drive production sites with Chrome MCP, can be a monster at answering operational questions.
Say you need to present a new statistic to a prospective partner, or an enterprise client has an operational issue that needs to be escalated. Sales/account management pings people, and pretty soon there's a web of connections that range between email, ticketing systems, Slack, and Claude Code sessions. Someone being brought in needs to be brought up to speed on that entire web. It's a highly focused conversation with human and AI participants, that (because human counterparties need to weigh in) by definition must happen in parallel with other work.
So many companies would benefit from a Hub that speaks agentic workflows, and streams progress token by token, fluently.
Could Anthropic excel at building a backend for this? Absolutely.
Could they excel at building a frontend that takes the world by storm the way Slack did, with its radical simplicity? Unfortunately I'm not as confident here. Consider that their VS Code plugin lags their terminal TUI so massively that it still is impossible to rename sessions [0], much less use things like remote-control functionality.
Show me that they can treat native-feeling multi-platform UI with as much care as they do their agentic loops, and I'll show you a company that could change every business forever.
I'm discovering new possibilities all the time with how Claude can work on a new type of task in our codebase and business more broadly. While a lot of this can be brought to the team by saying "encapsulate what you just did into a skill," sometimes it's as much about knowing what kinds of prompts to use to guide it as well.
Showing a colleague that flow, and the sequence of not just prompts but the types of Claude outputs to expect, all leading to Claude doing something that would have taken us a half day of work? As a linear video, rather than just a dump or screenshot of a final page? That could help to diffuse best practices rapidly.
OP - you might want to look at the kind of data model Loom used for this problem for videos in general, in terms of workspaces and permissions. Could make a startup out of this!
(Also as a smaller note - you might want to skip over long runs and generations by default, rather than forcing someone into 5x mode! A user of this would want to see messages, to and from Claude, at a standardized rate - not necessarily a sped up version of clock time.)
That’s a really interesting way to frame it — showing the flow of prompts and responses rather than just the final result.
I’ve mostly been using it for demos and sharing sessions with teammates, but the training / best-practices angle is a great point.
On navigation: you can already step through turns with the arrow keys or jump around the timeline, so you don’t have to sit through long generations. But I agree that smarter defaults (skipping or collapsing long runs) could make it smoother.
And the Loom comparison is interesting — I hadn’t thought about the workspace/permission side yet since this started as a small CLI tool for sharing sessions, but that’s a good direction to think about.
> Showing a colleague that flow, and the sequence of not just prompts but the types of Claude outputs to expect, all leading to Claude doing something that would have taken us a half day of work? As a linear video, rather than just a dump or screenshot of a final page? That could help to diffuse best practices rapidly.
Would this not be visible in a text dump without taking half a day to watch? What's/who's the benefit/benificiary of the realtime experience here?
Granted, I have friends who don't read but prefer visual stimulation. I don't think the overlap with people comfortable with code is very large at all.
They can go hand in hand! But if you give a dump of a session to someone, with literal reams of command inputs and outputs etc. interleaved in the session... they'll most likely read the beginning and the end. And possibly absorb the importance of the discovery, but not the types of "prodding" I was doing to the agent to make that possible.
Slowing it down to show the back-and-forth, and to let the viewer absorb and internalize the techniques behind each "prod," is vital!
In all seriousness, I’m unsure that official job numbers (even if they weren’t intentionally distorted, which is a big if these days) have caught up with the gig/creator economy. If a person making ends meet with food delivery and a few dollars of ad revenue is classified as “self-employed,” is that the same level of stability and ability to keep up with cost-of-living increases (which may outpace traditional inflation) vs. self-employed freelancers with clients? Which isn’t to cast shade on those paths, but it’s meaningful to the metrics we choose to follow.
Yes, they have. The BLS actually tracks a number of different "unemployment" numbers, whose definition you see here [0].
The "official" unemployment number, the one now reported as 4.4%, basically only counts the "percent of people actively looking for work that can't find it, who have been looking for work for more that 15 weeks.
The number you are trying to capture is what the BLS calls "U-6". That number is defined as:
> total unemployed, plus all marginally attached workers, plus total employed part time for economic reasons, as a percent of the civilian labor force plus all marginally attached workers.
In other words, anyone that would like more work but can't get it. I encourage you to read the entire definition and footnotes at the link I shared. It's very interesting!
Right now U-6 is at 8%. During the 2007 recession it peaked at about 17%. [1]
Thanks for bringing this up, and you're right that this is closer. I still think it's imperfect, because a gig economy worker who works 35+ hours per week would be considered "employed full time" (footnotes, https://www.bls.gov/cps/cpsaat36.htm) and as far as I know would not be included in the U-6.
> The second time the same (or similar) input is used these states are already created and it is linear.
Does this imply that the DFA for a regex, as an internal cache, is mutable and persisted between inputs? Could this lead to subtle denial-of-service attacks, where inputs are chosen by an attacker to steadily increase the cached complexity - are there eviction techniques to guard against this? And how might this work in a multi-threaded environment?
Yes, most (i think all) lazy DFA engines have a mutable DFA behind a lock internally that grows during matching.
Multithreading is generally a non-issue, you just wrap the function that creates the state behind a lock/mutex, this is usually the default.
The subtle denial of service part is interesting, i haven't thought of it before. Yes this is possible. For security-critical uses i would compile the full DFA ahead of time - the memory cost may be painful but this completely removes the chance of anything going wrong.
There are valid arguments to switch from DFA to NFA with large state spaces, but RE# intentionally does not switch to a NFA and capitalizes on reducing the DFA memory costs instead (eg. minterm compression in the post, algebraic simplifications in the paper).
The problem with going from DFA to NFA for large state spaces is that this makes the match time performance fall off a cliff - something like going from 1GB/s to 1KB/s as we also show in the benchmarks in the paper.
As for eviction techniques i have not researched this, the simplest thing to do is just completely reset the instance and rebuild past a certain size, but likely there is a better way.
> Multithreading is generally a non-issue, you just wrap the function that creates the state behind a lock/mutex, this is usually the default.
But you also have to lock when reading the state, not just when writing/creating it. Wouldn’t that cause lock contention with sufficiently concurrent use?
No, we do not lock reading the state, we only lock the creation side and the transition table reference stays valid during matching even if it is outdated.
Only when a nonexistent state is encountered during matching it enters the locked region.
> are there eviction techniques to guard against this?
RE2 resets the cache when it reaches a (configurable) size limit. Which I found out the hard way when I had to debug almost-periodic latency spikes in a service I managed, where a very inefficient regex caused linear growth in the Lazy DFA, until it hit the limit, then all threads had to wait for its reset for a few hundred milliseconds, and then it all started again.
I'm not sure if dropping the whole cache is the only feasible mitigation, or some gradual pruning would also be possible.
Either way, if you cannot assume that your cache grows monotonically, synchronization becomes more complicated: the trick mentioned in the other comment about only locking the slow path may not be applicable anymore. RE2 uses RW-locking for this.
I have experienced this as well, the performance degradation of DFA to NFA is enormous and while not as bad as exponential backtracking, it's close to ReDoS territory.
The rust version of the engine (https://github.com/ieviev/resharp) just returns an Error instead of falling back to NFA, I think that should be a reasonable approach, but the library is still new so i'm still waiting to see how it turns out and whether i had any oversights on this.
Here RE2 does not fall back to the NFA, it just resets the Lazy DFA cache and starts growing it again. The latency spikes I was mentioning are due to the cost of destroying the cache (involving deallocations, pointer chasing, ...)
> Readers are simply more willing to tolerate a lightspeed jump from belief X to belief Y if the writer himself (a) seems taken aback by it and (b) acts as if they had no say in the matter - as though the situation simply unfolded that way.
All these allow a presenter to frame a discovery or result as "surprising" and "novel" - even if, from the very start, the rhetorical goal was to take a pre-ordained desire to publish along certain lines, and tweak things to present it as if it was a happenstance discovery, washing the presenter's hands of that intentionality.
One of the things I worry about, especially as education shifts more and more towards AI, is that we lose the critical thinking skill of: "here are a set of facts that are true, but there can still be bias in the process by which those facts are selected, thus one must look beyond the facts presented."
And in theory, AI could help us to do this with every fact we consume! But it's steered (quite intentionally) towards giving simple answers, even when reality isn't simple, and the underlying goal of those presenting the facts that entered one's corpus is as important as those facts' existence.
This is also just the direction that AI is taking us, even for people who wouldn't describe themselves as traditional developers.
Setting aside on-device LLMs, one needs RAM and disk space just for the multiple isolated Claude Cowork etc. VMs that will increasingly become part of people's everyday lives.
And when it's easier than ever to create an Electron app, everything's going to have an Electron app, with all the RAM/disk overhead that entails. And of course, nobody's asking their agents "optimize the resource usage of the app I made last week" - they're moving on to the next feature or project.
I suppose the demoscene will always be there, for those of us who increasingly need a refuge from ram-flation.
Some press coverage (though I highly recommend just reading the paper linked as the OP, it’s quite approachable to skim without prior knowledge, and you get to see how they turn the Star Trek replicator problem into “just” a loss optimization problem with projectors and spinning mirrors!):
And as other have noted, it’s worth bearing in mind that most images here are less than a centimeter in scale; the scale bar is a millimeter. Super impressive stuff.
Minor correction. Actually confusingly the scale bars vary not just from figure to figure but from image to image within a single figure as noted in the captions. It's a rather odd choice IMO.
There's also a reasonable alignment between Tailwind's original goal (if not an explicit one) of minimizing characters typed, and a goal held by subscription-model coding agents to minimize the number of generated tokens to reach a working solution.
But as much as this makes sense, I miss the days of meaningful class names and standalone (S)CSS. Done well, with BEM and the like, it creates a semantically meaningful "plugin infrastructure" on the frontend, where you write simple CSS scripts to play with tweaks, and those overrides can eventually become code, without needing to target "the second x within the third y of the z."
Not to mention that components become more easily scriptable as well. A component running on a production website becomes hackable in the same vein of why this is called Hacker News. And in trying to minimize tokens on greenfield code generation, we've lost that hackability, in a real way.
I'd recommend: tell your AGENTS.md to include meaningful classnames, even if not relevant to styling, in generated code. If you have a configurability system that lets you plug in CSS overrides or custom scripts, make the data from those configurations searchable by the LLM as well. Now you have all the tools you need to make your site deeply customizable, particularly when delivering private-labeled solutions to partners. It's far easier to build this in early, when you have context on the business meaning of every div, rather than later on. Somewhere, a GPU may sigh at generating a few extra tokens, but it's worthwhile.
In a version of a trolley problem where you're on a track that will kill innocent people, and you have the opportunity to set up a contract that effectively moves a switch to a track without anyone on it, is it not imperative to flip that switch?
(One might argue that increased reaction times might save service members' lives - but the whole point is that if the autonomous targeting is incorrect, it may just as well lead to increased violence and service member casualties in the aggregate.)
And we're not talking about the ethics board manipulating individual token outputs subtly, which would indeed be a supply chain risk - we're talking about a contractual relationship in which, if a supplier detects use outside of the scope of an agreed contract, it has the contractual right to not provide the service for that novel use, while maintaining support for prior use cases.
The fact that the government would use the threat of supply chain risk to enforce a better contract is unprecedented, and it deteriorates the government's standing as a reliable counterparty in general.
reply