Hacker Newsnew | past | comments | ask | show | jobs | submit | more prescriptivist's commentslogin

A missing link right now is automated high-quality code reviews. I would love an adversarial code review agent that has a persona oriented around all incoming code being slop, that leverages a wealth of knowledge (both manually written by the team and/or aggregated from previous/historical code reviews). And that agent should pull no punches when reviewing code.

This would augment actual engineer code reviews and help deal with volume.


Cursor Bugbot is a game changer — runs on PR and finds the most subtle of bugs in enormous PRs.


I've been asking for security audits as I go. It's not perfect but it's something. And it picks up the most obvious stuff.


Anthropic makes it kind of clear in all of their statements that they are not opposed to working with the surveillance state, with the military industrial complex, etc. Their central philosophy, it seems, is not incongruent with working with entities, public or private, that can be construed as imperialist or capitalistic or a combination of both. I actually appreciate their honesty here.

They exist within the regime of capital and imperialism that all of us who are American citizens exist within. This isn't a cop-out or cope. It's just the reality of the world that we live in. If you are an American and somehow above it, let me know how you live.


I have an agent system analyzing time series data periodically. What I've landed on is the tools themselves pre-process time series data, giving it more semantic meaning. AKA converting timestamps to human dates, additionally preprocessing it with statistical analysis, such as calculating current windows min/mean/max value for the series as well as a the same for a trailing window and surfacing those in the data. Also adding a volatility score, and doing things like collapsing runs of similar series that aren't particularly interesting from a volatility perspective and just trying to highlight anomalous series in the window in various ways.

This isn't anything new. It's not particularly technical or novel in any way, but it seems to work pretty well for identifying anomalies and comparing series over time horizons. It's even less token efficient on small windows than piping in a bunch of json, but it seems to be more effective from an analysis point of view.

The strange thing about it is that it involves fairly deterministic analysis before we even send the data to the LLM, so one might ask, what's the point if you're already doing analysis? The answer is that LLMs can actually find interesting patterns across a lot of well presented data, and they can pick up on patterns in a way that feels like they are cross-referencing many different time series and correlate signals in interesting ways. That's where the general purpose LLMs are helpful in my experience.

Breaking out analysis into sub-agents is a logical next step, we just haven't gotten there yet.

And yeah the goal is to approximate those of us engineers who are good at RCAs in the moment, who have instincts about the system and can juggle a bunch of tabs and cross reference the signals in them.


This was my approach when using agents to analyze HVAC IoT data doing anomaly detection / investigations and it similarly worked very well. Mix that with some context like install location, geographic features with some context / info on seasonality (like ASHRAE values for the regions), and some classification like (residential / commercial), the bot was quite able to deliver actual insights into problems vs creating a bunch of excess noise.

We also mixed in some GSA (https://arxiv.org/abs/2503.04104) steps during the analysis in the sub agents to further reduce hallucinations


Glad to hear this. I actually went down this path based off of guidance from multiple LLMs (Anthropic, OpenAI, etc.), so I wasn't sure if it was just some kind of weird hallucination they all had or if they were regurgitating a very small amount of knowledge on this topic, because it was kinda hard to find stories where people had success with these strategies. Thank you for the link to the paper. I will definitely be reading it.


> People on here act like we don’t know if AI will be useful. And I’m sitting over here puzzled because of how fucking useful it is.

Yes, it's very strange to read AI threads here because the general tone is so different than, say, at the company I work at, where hundreds of engineers are given enormous monthly token budgets and are being pushed to have the LLMs write as much code as possible. They're not forced to, and no one is reprimanded for not adopting Claude Code or Codex or Cursor. But there's been a strong tonal shift in technology leadership in the last month that basically implies that this is how it is going to be done in the future whether one likes it or not.

As for me, I've been writing all of my code via Claude for a while now, and I don't think I will ever go back to working in an editor writing code the way I did for most of my career. Nor do I want to.


This is a funny point that you're making (for me, anyway), because prior to early December, probably 5% of the lines of code I wrote in a week were AI-generated by cursor. Then I started using Claude Code. Fast forward to today, I would say 98% of the code that I've shipped in the last three weeks has been written completely by Claude Code.

Prior to three weeks ago, I had used speech-to-text to do accomplish approximately 0% of the work I've done in my 20 years of coding. In the last three weeks, well over half of the direction that I've given to Claude Code has been done with speech-to-text.


How are you doing speech-to-text with Claude Code?


Just Wispr Flow and a PTT key binding. It's very good for doing plans with Claude Code because I can just ramble and ramble. As long as I just convey the details of what I want over a sufficiently long string of text, it will work even if it has errors in speech-to-text or I have slight contradictions in my framing of the prompt.

If I need to explicitly reference files in the plan prompt, I just manually annotate them into the prompt at the end.


What does the code do?


CRUD


It's always something that already exists but requires 100x the code.


Yeah, if this is going to be a nuclear bomb for software developers, imagine what it's going to be like for people in customer service, account managers, etc.


I would say the vast majority of people in this thread don't believe that this is related to AI at all, other than as a pretext. It's kind of incredible.


As in AI is a belief? This isn't a religion. IF things don't add up they don't.


This seems like a real coarse and not particularly accurate binary, but even if it were true, the thing about Claude Code and agentic coding like this is the cost of making a mistake or the cost of not being happy with a design and having to back it out is getting smaller and smaller.

I would argue that rapidly iterating reveals more about the problem, even for the most thoughtful of us. It's not like you check your own reasoning at the door when you just dive head first into something.


Assuming these plans are based on Gemini, Google is doing these users a favor, frankly.


Antigravity gives access to Sonnet and Opus 4.6, I would presume most people are using those models rather than Gemini


I don't think you've used it lately. Gemini 2/2.5 were garbage-tier. The flash level models are absolute trash. 3/3.1 pro are state of the art.


I have. 3 is fine, 3.1 is good. But they are terribly slow. Quality is fine but the the only thing they have going for them is flash pricing. Their response performance sucks.


Honestly? Yeah.

I've been writing code for 25 years.

A year ago my org brought cursor in and I was skeptical for a specific reason: it was good at breaking CI in weird ways and I keep the CI system running for my org. Constants not mapping to file names, hallucinating function names/args, etc. It was categorically sloppy. And I was annoyed that engineers weren't catching this sloppy stuff. I thought this was going to increase velocity at the expense of quality. And it kind of did.

Fast forward a year and I haven't written code in a couple of weeks but I've shipped thousands LOC. I'm probably the pace setter on my team for constantly improving and experimenting with my AI flow. I speak to the computer probably half the time, maybe 75% on some days. I have multiple sessions going at all times. I review all the code Claude writes, but it's usually a one shot based on my extensive (dictated) prompts.

But to your identity crisis point, things are weird. I haven't actually produced this much code in a long time. And when I hit some milestone there are some differences between now and the before days: I don't have the sense of accomplishment that I used to get but also I don't have the mental exhaustion that I would get from really working through a solution. And so what I find is I just keep going and stacking commit after commit. It's not a bad thing, but it's fundamentally different than before and I am struggling a bit with what it means. Also to be fair I had lost my pure love of coding itself, so I am in a slightly weird spot with this, too.

What I do know is that throwing myself fully into it has secured my job for the foreseeable future because I'm faster than I've ever been and people look to me for guidance on how they can use these tool. I think with AI adoption the tallest trees will be cut last -- or at least I'm banking on it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: