Hacker Newsnew | past | comments | ask | show | jobs | submit | tty456's commentslogin

This administration is 100% acting in a way that it never plans to leave.

I don't get the comments trashing this. If it slightly beats or even matches Opus 4.6, it means Meta is capable of building a model competitive with the leading AI company. Sure, they spent a lot of money and will have on-going costs. But how much more work would it take to turn that into a coding agent people are willing to try (and pay for) along side their usage of a collection of agents (Claude, Codex, etc)? Also means Meta doesn't have to pay another company to use a SATA model across all their products (including IG and WhatsApp, vr) which will matter to their balance sheet long term (despite the constant r&d spend).

Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

> 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

Source? (Even if rumor)


NYTimes had a story about this (March 12):

> Meta’s new foundational A.I. model, which the company has been working on for months, has fallen short of the performance of leading A.I. models from rivals like Google, OpenAI and Anthropic on internal tests for reasoning, coding and writing, said the people, who were not authorized to speak publicly about confidential matters.

> The model, code-named Avocado, outperformed Meta’s previous A.I. model and did better than Google’s Gemini 2.5 model from March, two of the people said. But it has not performed as strongly as Gemini 3.0 from November, they said.

> They added that the leaders of Meta’s A.I. division had instead discussed temporarily licensing Gemini to power the company’s A.I. products, though no decisions have been reached.

https://www.nytimes.com/2026/03/12/technology/meta-avocado-a...

https://archive.is/uUV5h#selection-715.98-715.277


[flagged]


If you are trying to come up with anti-media conspiracies there are always plenty of ways to do it against any media company.

The idea that NY Times is particularly anti-Meta seems a stretch. They - like most traditional media companies - are anti-tech in general. The fact they also collect data doesn't make their reporting untrue.

Personally I think a much more interesting rumor to make up would be that Yann Lecun (who famously had his reporting lines rearranged to go through Alexander Wang after Scale.ai acquihire) works at New York University.

New York University is in the same place as the New York Times.

There's a conspiracy for you. I made it up, but I mean it could be true I guess?

(Of course Lecun also publicly congratulated Wang on the launch of the model. But maybe that's a ruse to hide everything.. blah blah)


>They - like most traditional media companies - are anti-tech in general. The fact they also collect data doesn't make their reporting untrue.

(sigh) In olden times you would have been free to use the em dash as you pleased. Unfortunately, now it's considered signal that you're an AI bot.


Readers here can't fathom that the NYT has inherent bias in a lot of its reporting

Does Meta not harvest data on a massive scale? Not sure what exactly is the issue with doing a series on that.

So llama4 is great? Have you been using it?

It was from a techmeme ride home podcast where the host discussed "sources at the company said". I don't remember which day's episode it was.

The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.

Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.

Also, Gemini 2.5 Pro launched a week before Llama 4.

It was Gemini 2.5 Pro that redeemed Google in the eyes of most people as a valid competitor to OpenAI instead of as a joke, so Meta dropping the ball with Llama 4 was extra bad.


the models were objectively horrible

They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.

> They were ~gpt4o, with the added benefit that you could run them on premise.

No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.

I have my own agentic benchmark[1] I use to compare models.

Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.

By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).

Llama 4 was a bad model, unfortunately.

[1] https://sql-benchmark.nicklothian.com/#all-data


> By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!)

Gemma 4 E4B is slightly confusingly named, its a 8B param model


You are completely right on both counts.

It is a 8B model, and it is confusingly named. In fact I made exactly the same point[1] when it was released and promptly forgot!

[1] https://news.ycombinator.com/item?id=47622694


Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.


Thanks for calling me a bot. Llama4 and meta ai sucks

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.


Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit

I don't know how Zuck intervening could change float32s in a trained model, so I don't think I think that, but maybe I'm parsing your words incorrectly.

failing non-stop at tool calls on top of that.

Why go into coding agents? Both anthropic and OpenAI are going all in on that. The opportunity is customer facing AI now.

OpenAI has the mindshare but they going to have to decide if they allocate their limited compute for free users or go all in trying to keep up with Anthropic in enterprise.


you can do way more than just coding with the coding agents.

Because coding agents are where the revenue is.

If you squint at coding agents you see the next OS.

Maybe better phrasing is “HCI paradigm”, but that somehow manages to say everything and nothing.


Programming was always about designing rube goldberg systems that did a complicated state machine akin to dominos but now we have a probabalistic and nondeterministic domino that has a huge amount of dominos inside amd can dynamically generate many different paths of dominos sometimes not even leading to the intended final domino you wanted to fall.

I see it more like a compiler


I agree that it's more like a compiler (turns higher level language into machine code) but I also think that's only half the story - a compiler could never turn requirements into functional software, generate boilerplate or debug. It's also a development tool

It's a decent model if the benchmarks are to be believed, but it won't be close to Opus in usefulness for programming. None of these benchmarks completely capture what makes a model useful for day-to-day coding tasks, unfortunately. It will take time for them to catch up, and Opus will keep improving in the meantime. But it's good to have more competition.

Benchmarks miss the thing that actually matters for agentic use: how does behavior change over a multi-day horizon? A model that scores well on one-shot coding tasks can still make terrible decisions when it has persistent state and resource constraints. That's where you see the real gaps between models.

Is there a benchmark for these long tasks? That kind of seems like the only number worth measuring.

(Of course at that point it involves memory and context management and so on, so you're testing the harness as well as the model.)


> If it slightly beats or even matches Opus 4.6

It doesn't though


Curious on why you think this. Any data points that led you to this?

The benchmarks they released

What do you mean? In most cases, the benchmarks show a larger number for Muse and a smaller number for Opus.

In Multimodal yes, but Opus is definitely edging out in Text/Reasoning and Agentic benchmarks.

I think the general skepticism is because they are late to race, and they are releasing a Opus-4.6-equivalent model now, when Anthropic is teasing Mythos.


> I don't get the comments trashing this.

People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.


That is not the case here. Nobody hated on llama 1,2,3 at all. They justifiably felt burned by the benchmaxxing of llama 4. Trust broken must be re-earned, and benchmarks alone cannot do that.

Because bots and trillion dollar ipos and even bigger stakes. People need to better appreciate the level of manipulation going on. Social media has an outsized impact. Bots and even people are getting paid to post and upvote/downvote narratives.

> people are getting paid to post and upvote/downvote narratives

This problem will be solved shortly with better AI (if it hasn't essentially been solved already).

No more humans in the loop, much lower costs for social media manipulation. Welcome to the future!


Bingo.


What about the brig?


It's just Wesley in there, no big


Why?


> Kids delete real-life friends for not having enough likes on the watch.


Sheldon Brown's content is great, but is it ironic that the first thing you see on his site is a Google banner ad?

Understandably, he'd like to earn money on his content and I see no problem with that. But for me to visit his site and have Google add yet another tracking event to their "interest pile" about me (I guess i'm in the market for bikes now?) is a bit off putting.

He can't be making more than a few bucks a month through that single ad, right?


I truly had no idea, I guess I've always had an ad blocker.

He's been dead since 2008, so I assume the banner ad keeps the lights on in the absence of his income and input.


He died about ten years ago.


As the author is dead, I'm sure the money goes towards site hosting fees.


I assume nobody removed it and the revenue is just added to some Google Adsense balance sheet, and reports go to some Gmail account that will expire one day.


> Their API can't tell you the chef left last month

Your API can do that? Using what data?


That is our vision of where we want to be. There is a lot of information about the places on the public web which you analyze and cross-reference. And we started to solve this problem with validation API which can tell you if a business or point of interest exists at current location.


Only 2 reasons one would stick around: Money and/or visa constraints


Google v. Oracle ruled that use of APIs are fair game and could be argued that test cases are strictly a use of APIs and not implementation.


Google vs Oracle ruled that APIs fall under copyright (the contrary was thought before). However, it was ruled that, in that specific case, fair use applied, because of interoperability concerns. That's the important part of this case: fair use is never automatic, it is assessed case by case.

Regarding chardet, I'm not sure "I wanted to circumvent the license" is a good way to argue fair use.


The individuals making these decisions are 100% aware of what they are doing. Driving for and implementing stuff like this is for profits, bonuses, and internal recognition.


Suck has made his mind up about which side he’s on with his money. I recall a time when people on the Forbes list were quietly political.


Right, this is socipathy, kleptocracy and pure madness that having more money than need generates.


Accurate description of META.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: