For the last 2 years, startup wisdom has been that models will continue to get cheaper and better. Claude first, and now Gemini has shown that it's not the case.
We priced an enterprise contract using Flash 1.5 pricing last summer, and today that contract would be unit economic negative if we used Flash 3. Flash 2.5 and now Flash 3.1 Lite barely breaks even.
I predict open-source models and fine-tuning are going to make a real comeback this year for economic reasons.
Yea but there is a whole world of tasks for which Flash 2.5-lite was sufficiently intelligent. Given Google's depreciation policy, there will soon be no way to get that intelligence at that price.
I mean the same level of intelligence does get cheaper. People just care about being on the frontier. But if you track a single level of intelligence the price just drops and drops.
+1 to this. Been using Codex the last few months, and this morning I asked it to plan a change. It gave me generic instructions like 'Check if you're using X' or 'Determine if logic is doing Y' - I was like WTF.
Codegen hosted in a way. Codegen is local and outputs a Playwright script. We add managed infra, observability, retries, and agent fallback recoveries when things break.
> Would you trust a Cursor review of Claude-written code more, less, or the same as a Cursor review of Cursor-written code?
You're assuming models/prompts insist on a previous iteration of their work being right. They don't. Models try to follow instructions, so if you ask them to find issues, they will. 'Trust' is a human problem, not a model/harness problem.
> Our view is that code validation will be completely autonomous in the medium term.
If reviews are going to be autonomous, they'd be part of the coding agent. Nobody would see it as an independent activity, you mentioned above.
> Our first step towards making this easier is a native Claude Code plugin.
Claude can review code based on a specific set of instructions/context in an MD file. An additional plugin is unnecessary.
My view is that to operate in this space, you gotta build a coding agent or get acquired by one. The writing was on the wall a year ago.
I agree that it should be open-source, but I think it can still be a YC company. Improving the user experience on the web is definitely a billion-dollar market.
It's a freaking browser extension. Not trying to insult anyone or be negative, but I genuinely don't understand why anyone would invest money into this.
the only valid reasons to participate in hacker news is to get your startup funded, to get hired by one of the yc startups, or to sell something. it doesn't really make sense to participate in this forum anymore, otherwise, especially if you are just giving people free product development advice.
Respectfully disagree. Why is it colloquially known as "Hacker News", and not say "Startup Forum"? My favorite articles & content on Hacker News are where I stay up to date on technology and what people are doing--which is very literally inline with the name "Hacker News".
This looks really slick, though it's a bummer that there isn't a quick way to try the hosted version. You mentioned the Vercel UX in the comments, and I think the single-click install on the hosted version is a significant part of it.
reply