More

tty456 · 2026-04-09T17:22:16 1775755336

This administration is 100% acting in a way that it never plans to leave.

tty456 · 2026-04-08T17:13:36 1775668416

I don't get the comments trashing this. If it slightly beats or even matches Opus 4.6, it means Meta is capable of building a model competitive with the leading AI company. Sure, they spent a lot of money and will have on-going costs. But how much more work would it take to turn that into a coding agent people are willing to try (and pay for) along side their usage of a collection of agents (Claude, Codex, etc)? Also means Meta doesn't have to pay another company to use a SATA model across all their products (including IG and WhatsApp, vr) which will matter to their balance sheet long term (despite the constant r&d spend).

prodigycorp · 2026-04-08T17:19:04 1775668744

Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

refulgentis · 2026-04-09T02:37:04 1775702224

> 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

Source? (Even if rumor)

nl · 2026-04-09T06:03:25 1775714605

NYTimes had a story about this (March 12):

> Meta’s new foundational A.I. model, which the company has been working on for months, has fallen short of the performance of leading A.I. models from rivals like Google, OpenAI and Anthropic on internal tests for reasoning, coding and writing, said the people, who were not authorized to speak publicly about confidential matters.

> The model, code-named Avocado, outperformed Meta’s previous A.I. model and did better than Google’s Gemini 2.5 model from March, two of the people said. But it has not performed as strongly as Gemini 3.0 from November, they said.

> They added that the leaders of Meta’s A.I. division had instead discussed temporarily licensing Gemini to power the company’s A.I. products, though no decisions have been reached.

https://www.nytimes.com/2026/03/12/technology/meta-avocado-a...

https://archive.is/uUV5h#selection-715.98-715.277

o10449366 · 2026-04-09T08:50:17 1775724617

[flagged]

nl · 2026-04-09T11:48:12 1775735292

If you are trying to come up with anti-media conspiracies there are always plenty of ways to do it against any media company.

The idea that NY Times is particularly anti-Meta seems a stretch. They - like most traditional media companies - are anti-tech in general. The fact they also collect data doesn't make their reporting untrue.

Personally I think a much more interesting rumor to make up would be that Yann Lecun (who famously had his reporting lines rearranged to go through Alexander Wang after Scale.ai acquihire) works at New York University.

New York University is in the same place as the New York Times.

There's a conspiracy for you. I made it up, but I mean it could be true I guess?

(Of course Lecun also publicly congratulated Wang on the launch of the model. But maybe that's a ruse to hide everything.. blah blah)

squidlogic · 2026-04-09T15:06:40 1775747200

>They - like most traditional media companies - are anti-tech in general. The fact they also collect data doesn't make their reporting untrue.

(sigh) In olden times you would have been free to use the em dash as you pleased. Unfortunately, now it's considered signal that you're an AI bot.

Rover222 · 2026-04-09T21:55:25 1775771725

Readers here can't fathom that the NYT has inherent bias in a lot of its reporting

mupuff1234 · 2026-04-09T11:01:03 1775732463

Does Meta not harvest data on a massive scale? Not sure what exactly is the issue with doing a series on that.

owebmaster · 2026-04-09T12:08:56 1775736536

So llama4 is great? Have you been using it?

prodigycorp · 2026-04-09T05:01:36 1775710896

It was from a techmeme ride home podcast where the host discussed "sources at the company said". I don't remember which day's episode it was.

zozbot234 · 2026-04-08T17:24:16 1775669056

The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.

dilap · 2026-04-08T17:55:14 1775670914

Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.

jychang · 2026-04-08T23:06:52 1775689612

Also, Gemini 2.5 Pro launched a week before Llama 4.

It was Gemini 2.5 Pro that redeemed Google in the eyes of most people as a valid competitor to OpenAI instead of as a joke, so Meta dropping the ball with Llama 4 was extra bad.

prodigycorp · 2026-04-08T17:31:52 1775669512

the models were objectively horrible

NitpickLawyer · 2026-04-08T17:39:48 1775669988

They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.

nl · 2026-04-09T02:46:22 1775702782

> They were ~gpt4o, with the added benefit that you could run them on premise.

No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.

I have my own agentic benchmark[1] I use to compare models.

Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.

By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).

Llama 4 was a bad model, unfortunately.

[1] https://sql-benchmark.nicklothian.com/#all-data

ac29 · 2026-04-09T15:23:06 1775748186

> By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!)

Gemma 4 E4B is slightly confusingly named, its a 8B param model

nl · 2026-04-10T02:12:24 1775787144

You are completely right on both counts.

It is a 8B model, and it is confusingly named. In fact I made exactly the same point[1] when it was released and promptly forgot!

[1] https://news.ycombinator.com/item?id=47622694

refulgentis · 2026-04-08T18:15:39 1775672139

Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.

owebmaster · 2026-04-09T12:10:55 1775736655

Thanks for calling me a bot. Llama4 and meta ai sucks

prodigycorp · 2026-04-08T17:46:03 1775670363

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

refulgentis · 2026-04-08T18:14:15 1775672055

I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.

alex1138 · 2026-04-08T19:57:34 1775678254

Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit

refulgentis · 2026-04-08T21:28:41 1775683721

I don't know how Zuck intervening could change float32s in a trained model, so I don't think I think that, but maybe I'm parsing your words incorrectly.

pixel_popping · 2026-04-08T20:49:41 1775681381

failing non-stop at tool calls on top of that.

canes123456 · 2026-04-08T23:28:00 1775690880

Why go into coding agents? Both anthropic and OpenAI are going all in on that. The opportunity is customer facing AI now.

OpenAI has the mindshare but they going to have to decide if they allocate their limited compute for free users or go all in trying to keep up with Anthropic in enterprise.

kaycey2022 · 2026-04-09T01:56:37 1775699797

you can do way more than just coding with the coding agents.

foobiekr · 2026-04-09T01:03:25 1775696605

Because coding agents are where the revenue is.

refulgentis · 2026-04-09T02:38:07 1775702287

If you squint at coding agents you see the next OS.

Maybe better phrasing is “HCI paradigm”, but that somehow manages to say everything and nothing.

whattheheckheck · 2026-04-09T05:16:10 1775711770

Programming was always about designing rube goldberg systems that did a complicated state machine akin to dominos but now we have a probabalistic and nondeterministic domino that has a huge amount of dominos inside amd can dynamically generate many different paths of dominos sometimes not even leading to the intended final domino you wanted to fall.

I see it more like a compiler

RealStupidity · 2026-04-09T06:00:45 1775714445

I agree that it's more like a compiler (turns higher level language into machine code) but I also think that's only half the story - a compiler could never turn requirements into functional software, generate boilerplate or debug. It's also a development tool

modeless · 2026-04-08T19:22:47 1775676167

It's a decent model if the benchmarks are to be believed, but it won't be close to Opus in usefulness for programming. None of these benchmarks completely capture what makes a model useful for day-to-day coding tasks, unfortunately. It will take time for them to catch up, and Opus will keep improving in the meantime. But it's good to have more competition.

ai5iq · 2026-04-08T23:43:30 1775691810

Benchmarks miss the thing that actually matters for agentic use: how does behavior change over a multi-day horizon? A model that scores well on one-shot coding tasks can still make terrible decisions when it has persistent state and resource constraints. That's where you see the real gaps between models.

andai · 2026-04-09T12:53:09 1775739189

Is there a benchmark for these long tasks? That kind of seems like the only number worth measuring.

(Of course at that point it involves memory and context management and so on, so you're testing the harness as well as the model.)

redox99 · 2026-04-08T17:25:33 1775669133

> If it slightly beats or even matches Opus 4.6

It doesn't though

ryeguy_24 · 2026-04-08T17:30:14 1775669414

Curious on why you think this. Any data points that led you to this?

howdareme · 2026-04-08T17:40:30 1775670030

The benchmarks they released

johnfn · 2026-04-08T19:38:08 1775677088

What do you mean? In most cases, the benchmarks show a larger number for Muse and a smaller number for Opus.

spprashant · 2026-04-08T19:59:51 1775678391

In Multimodal yes, but Opus is definitely edging out in Text/Reasoning and Agentic benchmarks.

I think the general skepticism is because they are late to race, and they are releasing a Opus-4.6-equivalent model now, when Anthropic is teasing Mythos.

ChipopLeMoral · 2026-04-08T17:47:15 1775670435

> I don't get the comments trashing this.

People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.

jatora · 2026-04-09T02:00:46 1775700046

That is not the case here. Nobody hated on llama 1,2,3 at all. They justifiably felt burned by the benchmaxxing of llama 4. Trust broken must be re-earned, and benchmarks alone cannot do that.

blazespin · 2026-04-08T21:26:05 1775683565

Because bots and trillion dollar ipos and even bigger stakes. People need to better appreciate the level of manipulation going on. Social media has an outsized impact. Bots and even people are getting paid to post and upvote/downvote narratives.

asdfman123 · 2026-04-08T21:42:51 1775684571

> people are getting paid to post and upvote/downvote narratives

This problem will be solved shortly with better AI (if it hasn't essentially been solved already).

No more humans in the loop, much lower costs for social media manipulation. Welcome to the future!

tty456 · 2026-03-26T21:54:16 1774562056

Bingo.

tty456 · 2026-03-26T03:26:08 1774495568

What about the brig?

brynnbee · 2026-03-26T19:23:47 1774553027

It's just Wesley in there, no big

tty456 · 2026-03-23T13:49:39 1774273779

makeworld · 2026-03-23T15:37:25 1774280245

> Kids delete real-life friends for not having enough likes on the watch.

tty456 · 2026-03-17T20:42:04 1773780124

Sheldon Brown's content is great, but is it ironic that the first thing you see on his site is a Google banner ad?

Understandably, he'd like to earn money on his content and I see no problem with that. But for me to visit his site and have Google add yet another tracking event to their "interest pile" about me (I guess i'm in the market for bikes now?) is a bit off putting.

He can't be making more than a few bucks a month through that single ad, right?

hamdingers · 2026-03-17T21:02:51 1773781371

I truly had no idea, I guess I've always had an ad blocker.

He's been dead since 2008, so I assume the banner ad keeps the lights on in the absence of his income and input.

casenmgreen · 2026-03-17T21:03:46 1773781426

He died about ten years ago.

salynchnew · 2026-03-17T23:22:46 1773789766

As the author is dead, I'm sure the money goes towards site hosting fees.

47282847 · 2026-03-18T10:26:57 1773829617

I assume nobody removed it and the revenue is just added to some Google Adsense balance sheet, and reports go to some Gmail account that will expire one day.

tty456 · 2026-03-16T21:48:31 1773697711

> Their API can't tell you the chef left last month

Your API can do that? Using what data?

ymarkov · 2026-03-16T21:53:23 1773698003

That is our vision of where we want to be. There is a lot of information about the places on the public web which you analyze and cross-reference. And we started to solve this problem with validation API which can tell you if a business or point of interest exists at current location.

tty456 · 2026-03-13T00:58:19 1773363499

Only 2 reasons one would stick around: Money and/or visa constraints

tty456 · 2026-03-09T21:21:00 1773091260

Google v. Oracle ruled that use of APIs are fair game and could be argued that test cases are strictly a use of APIs and not implementation.

vbarrielle · 2026-03-09T21:28:23 1773091703

Google vs Oracle ruled that APIs fall under copyright (the contrary was thought before). However, it was ruled that, in that specific case, fair use applied, because of interoperability concerns. That's the important part of this case: fair use is never automatic, it is assessed case by case.

Regarding chardet, I'm not sure "I wanted to circumvent the license" is a good way to argue fair use.

tty456 · 2026-03-02T23:24:57 1772493897

The individuals making these decisions are 100% aware of what they are doing. Driving for and implementing stuff like this is for profits, bonuses, and internal recognition.

testbjjl · 2026-03-03T01:32:14 1772501534

Suck has made his mind up about which side he’s on with his money. I recall a time when people on the Forbes list were quietly political.

cyanydeez · 2026-03-02T23:29:48 1772494188

Right, this is socipathy, kleptocracy and pure madness that having more money than need generates.

hsuduebc2 · 2026-03-03T00:30:30 1772497830

Accurate description of META.