Hacker Newsnew | past | comments | ask | show | jobs | submit | redrove's commentslogin


Expired.

I’ve been using Codex Pro since they lobotomized Opus 4.6. Codex is so much better, GPT 5.4 xhigh fast is definitely the smartest and fastest model available.

For a while there I had both Opus 4.6 and Codex access and I frequently pitted them against each other, I never once saw Opus come out ahead. Opus was good as a reviewer though, but as an implementer it just felt lazy compared to 5.4 xhigh.

One feature that I haven’t seen discussed that much is how codex has auto-review on tool runs. No longer are you a slave to all or nothing confirmations or endless bugging, it’s such a bad pattern.

Even in a week of heavy duty work and personal use I still haven’t been able to exhaust the usage on the $200 plan.

I’ll probably change my mind when (not IF) OpenAI rug pull, but for spring ‘26, codex is definitely the better deal.


I also made the switch to OpenAI, the $20 plan, I dunno about "so much better" but it's more or less the same, which is great!

The models and tools levelling out is great for users because the cost of switching is basically nil. I'm reading people ITT saying they signed up for a year - big mistake. A year is a decade right now.


I underscored using xhigh + fast mode when saying it’s so much better.

Now with Opus 4.7 of course the “burden” of adjusting reasoning effort has been taken away from you even at the API level.

In my experience people don’t change the thinking level at all.


What issues did you consider about sending your code base to OpenAI?

None mate. Code is cheap, it's not worth anything any more, especially not my little personal projects

Any alternative to Claude Design ? Tried Figma with Opus 4.6 but it doesn't come close in my experience.

Codex is abysmal for UI design imo.


It really depends on what you‘re trying to do and what your skillset is.

But if you go information architecture first and have that codified in some way (espescially if you already have the templates), then you can nudge any agent to go straight into CSS and it will produce something reasonable.


I've been using paper.design and it's been working well for me via mcp on claude code

Have you tried stitch.withgoogle.com?

Thanks for the tip! Hadn't seen that, but definitely giving it a try.

I created some decent prototypes with stitch but I don't know how it compares to claude design

stitch.withgoogle.com

Google Stitch


Does this have a CLI only interface?


Yes. You could also look at the README.md.


There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.

Ollama is slower and they started out as a shameless llama.cpp ripoff without giving credit and now they “ported” it to Go which means they’re just vibe code translating llama.cpp, bugs included.


>Ollama is slower

I've benchmarked this on an actual Mac Mini M4 with 24 GB of RAM, and averaged 24.4 t/s on Ollama and 19.45 t/s on LM Studio for the same ~10 GB model (gemma4:e4b), a difference which was repeated across three runs and with both models warmed up beforehand. Unless there is an error in my methodology, which is easy to repeat[1], it means Ollama is a full 25% faster. That's an enormous difference. Try it for yourself before making such claims.

[1] script at: https://pastebin.com/EwcRqLUm but it warms up both and keeps them in memory, so you'll want to close almost all other applications first. Install both ollama and LM Studio and download the models, change the path to where you installed the model. Interestingly I had to go through 3 different AI's to write this script: ChatGPT (on which I'm a Pro subscriber) thought about doing so then returned nothing (shenanigans since I was benchmarking a competitor?), I had run out of my weekly session limit on Pro Max 20x credits on Claude (wonder why I need a local coding agent!) and then Google rose to the challenge and wrote the benchmark for me. I didn't try writing a benchmark like this locally, I'll try that next and report back.


It depends on the hardware, backend and options. I've recently tried running some local AIs (Qwen3.5 9B for the numbers here) on an older AMD 8GB VRAM GPU (so vulkan) and found that:

llama.cpp is about 10% faster than LM studio with the same options.

LM studio is 3x faster than ollama with the same options (~13t/s vs ~38t/s), but messes up tool calls.

Ollama ended up slowest on the 9B, Queen3.5 35B and some random other 8B model.

Note that this isn't some rigorous study or performance benchmarking. I just found ollama unnaceptably slow and wanted to try out the other options.


I really like LM Studio when I can use it under Windows but for people like me with Intel Macs + AMD gpu ollama is the only option because it can leverage the gpu using MoltenVK aka Vulkan, unofficially. We're still testing it, hoping to get the Vulkan support in the main branch soon. It works perfectly for single GPUs but some edge cases when using multiple GPUs are unsupported until upstream support from MoltenVK comes through. But yeah, I agree, it wasn't cool to repackage Georgi's work like that.


LM Studio is closed source.

And didn't Ollama independently ship a vision pipeline for some multimodal models months before llama.cpp supported it?


Yes, they introduced that Golang rewrite precisely to support the visual pipeline and other things that weren't in llama.cpp at the time. But then llama.cpp usually catches up and Ollama is just left stranded with something that's not fully competitive. Right now it seems to have messed up mmap support which stops it from properly streaming model weights from storage when doing inference on CPU with limited RAM, even as faster PCIe 5.0 SSDs are finally making this more practical.

The project is just a bit underwhelming overall, it would be way better if they just focused on polishing good UX and fine-tuning, starting from a reasonably up-to-date version of what llama.cpp provides already.


> There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.

Hmm, the fact that Ollama is open-source, can run in Docker, etc.?


Ollama is quasi-open source.

In some places in the source code they claim sole ownership of the code, when it is highly derivative of that in llama.cpp (having started its life as a llama.cpp frontend). They keep it the same license, however, MIT.

There is no reason to use Ollama as an alternative to llama.cpp, just use the real thing instead.


If it’s MIT code derived from MIT code, in what way is its openness ”quasi”? Issues of attribution and crediting diminish the karma of the derived project, but I don’t see how it diminishes the level of openness.


FOSS licensing can only exist in terms of Copyright. Without Copyright, you cannot license FOSS. If something has an incorrect Copyright attribution, then the license can be viewed as invalid until this deficiency has been corrected (obv. depending on local laws, etc).

On top of this, it would not be unreasonable for the numerous authors of llama.cpp to issue DMCA takedown requests if Ollama is unwilling to correct it.


Do y'all mean backend or the Ollama frontend or both? I find it trivially easy to sub in my local Ollama api thing in virtually all of the interesting frontend things. I'm quite curious about the "why not Ollama" here.


Does LM Studio have an equivalent to the ollama launch command? i.e. `ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4`


I don't think it does, but llama.cpp does, and can load models off HuggingFace directly (so, not limited to ollama's unofficial model mirror like ollama is).

There is no reason to ever use ollama.


> I don't think it does, but llama.cpp does

I just checked their docs and can't see anything like it.

Did you mistake the command to just download and load the model?


As a sibling comment answered you, it is `-hf`.

And yes, it downloads the model, caches it, and then serves future loads of that model out of the cache if the file hasn't changed in the hf repo.


So I'm summary: no, it does not have an equivalent command either.


-hf ModelName:Q4_K_M


Did you mistake the command to just download and load the model too?

Actually that shouldn't be a question, you clearly did.

Hint: it also opens Claude code configured to use that model


sure there's a reason...it works fine thats the reason


I feel like the READMEs for these 3 large popular packages already illustrate tradeoffs better than hacker news argument


lm studio is not opensource and you can't use it on the server and connect clients to it?


LM Studio can absolutely run as as server.


IIRC it does so as default too. I have loads of stuff pointing at LM Studio on localhost


Not necessarily; I would very much like to use those features on a Linux server. Currently the Anthropic implementation forces a desktop (or worse, a laptop) to be turned on instead of working headless as far as I understand it.

I’ll give clappie a go, love the theme for the landing page!


I disagree. I think a sharp drop in memory requirements of at least an order of magnitude will cause demand to adjust accordingly.


Department of Transportation always thinks adding more lanes will reduce traffic.

It doesn't, it induces demand. Why? Because there's always too many people with cars who will fill those lanes.


Citation needed. I've heard this quite often, but so far, I haven't seen proof of the stated causality.

PS: This doesn't mean that better public transportation could deliver more bang for the buck than the n-th additional car lane. But never ever have I heard from anybody that they chose to buy a car or use an existing car more often because an additional lane has been built.


Have you tried the "Reference" section on the Wikipedia article?

https://en.wikipedia.org/wiki/Induced_demand#cite_note-vande...


You've never heard anyone choose to take side streets instead of the highway because of traffic jams? No one ever goes out of their way to avoid heavily trafficed areas?


I don't understand what the point is you're trying to make. When people at t0 take detours because of traffic jams on the direct route, and then at t1, there are less traffic jam on the direct route due to additional lanes, so they decide to take the direct route, then total traffic is down, because they no longer take a detour. Even if they are still part of a newly induced traffic jam.


> Rent a VPS in another country and set up your own personal VPN server on it, and no one will be able to block you.

(machine translation)

How would this ever work with a whitelist? did you even read the post?


How did PYPI_PUBLISH lead to a full GH account takeover?


I'd imagine the attacker published a new compromised version of their package, which the author eventually downloaded, which pwned everything else.


Their Personal Access Token must’ve been pwned too, not sure through what mechanism though


They have written about it on github to my question:

Trivvy hacked (https://www.aquasec.com/blog/trivy-supply-chain-attack-what-...) -> all circleci credentials leaked -> included pypi publish token + github pat -> | WE DISCOVER ISSUE | -> pypi token deleted, github pat deleted + account removed from org access, trivvy pinned to last known safe version (v0.69.3)

What we're doing now:

    Block all releases, until we have completed our scans
    Working with Google's mandiant.security team to understand scope of impact
    Reviewing / rotating any leaked credentials
https://github.com/BerriAI/litellm/issues/24518#issuecomment...


69.3 isnt safe. The safe thing to do is remove all trivy access. or failing that version. 0.35 is the last and AFAIK only safe version.

https://socket.dev/blog/trivy-under-attack-again-github-acti...


I have sent your message to the developer on github and they have changed the version to 0.35.0 ,so thanks.

https://github.com/BerriAI/litellm/issues/24518#issuecomment...


Does that explain how circleci was publishing commits and closing issues?


Don't hold your breath for an answer.


>I am unable to understand how it compromised your account itself from the exploit at trivvy being used in CI/CD as well.

Token in CI could've been way too broad.


>1. Looks like this originated from the trivvy used in our ci/cd

Were you not aware of this in the short time frame that it happened in? How come credentials were not rotated to mitigate the trivy compromise?


The latest trivy attack was announced just yesterday. If you go out to dinner or take a night off its totally plausible to have not seen it.


afaik the trivy attack was first in the news on March 19th for the github actions and for docker images it was on March 23rd


[flagged]


Probably more "serious human" than "serious over-capitalist" or "seriously overworked". Good for them.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: