The fast part isn’t for your benefit, primarily, and news media would love to go slower and have more time if they could, and still survive. The race to break news first - in order to be the one to tell their audience something “new”, something they hadn’t heard elsewhere - is real and it has been around for all of modern civilization, for hundreds if not thousands of years. A one day turnaround was a thing purely due to daily newspaper print runs being the fastest distribution, it wasn’t because it was long enough to get it right. The reason they had a day is because the competition couldn’t get something out faster than that. Then for a while there were twice daily print runs to be more competitive. Then the internet came along, and now the only way for a site to get attention and be talked about on Hacker News is to report it before any other sites do.
There are some news media that do go slower and take their time, but I think they’re struggling to stay alive. Reuters is still reputable, but they no longer necessarily take a day. The big question is how do we get humanity to prefer slow & correct over fast, and it is even possible? When you hear about an earthquake in Venezuela, how do we stop people from Googling it immediately, and get them to wait for the best most correct story rather than reading whatever’s available now? In the case of natural disasters, I don’t think it’s possible anymore, no matter what case you make. I’m not sure it’s possible with stories like AI distillation either, even if you can absolutely cement the case for slow news. The fact that it’s async/internet now and that first still counts means we (you and I) are still going to give traffic and attention to sites that have the first information on a breaking topic, statistically, despite having a preference for correctness over speed. The one thing we can do is vote with our dollars by subscribing to whatever news media that does a better job than others.
> can you show me ownership of these kills people? You aren’t looking at this from a systems thinking perspective
It seems like you’re the one ignoring the system and demonstrated end-to-end results. Ownership of larger vehicles is posing a greater risk to the people outside those vehicles. It can’t be waved away with an imaginary responsibility argument; the drivers of larger vehicles would need to act more responsibly than drivers of smaller vehicles in order to compensate.
Having the constants at the top is more easily customizable, especially should this file get duplicated. If devs need to switch to http instead of https for testing or staging, it makes sense to separate the scheme from the domain and put the constants up top or even in another file. It also matters whether ‘url’ was constructed in multiple places or a single place. Having named constants at the top of the file is a very common style, and sometimes is part of the group coding standards.
Anyway, maybe there are other reasons too, so see Chesterton’s Fence. In any case, it’s never a good idea to assume cargo culting. Someone could easily say the same thing about using inline literals. If it looks weird, ask around and maybe you’ll find out there are good reasons, or maybe you’ll find out nobody cared and that people will like it if you refactor and embed the constants.
In my opinion (and it's just that - an opinion, and mine - yours may differ), it's better to make the code as stupid simple as possible. When you build it, don't assume future change in that spot, because you could be terribly wrong about exactly what would change, so you're adding complexity for no benefit. The first time you need to change the scheme from `https` to `http`, just change it inline in the URL building code. The second time you need to do it, make a constant or an env var.
Over time, everyone develops their own intuition and opinions on what sorts of change and refactors are likely, and which sorts will never come. It's not perfect - it's part of the art of software, not the science. I'm not going to claim I've never written an interface that had a useless cut-point - I definitely have written many.
In my opinion, it's better to err on the side of simplicity, understandability, and closeness than to prematurely factor out those constants.
It's a judgement call - every situation and every practitioner will have a different response. In this case, knowing this product, and as the code reviewer for this patch made by a newer engineer (senior but new to the team), I was very confident that we would never need to change the scheme, and that the extra line and extra cognitive load of groking `HTTPS_SCHEME + "://"` over just `"https://"` or `BASE_URL + blah` was not worth it.
I suggested that the engineer either rewrite it with an inline constant, or refactor the whole base URL out without separating the scheme.
You probably already have the common abstraction factored - the code to load pixels for a single sprite, and to display it? It makes sense to me that the level above that, interpreting the sprite sheet layout and modes of playback, come in different flavors and don’t have a common abstraction that fits all cases.
Personally I prefer what you’re doing over trying to come up with a non-obvious abstraction or trying to make an imperfect abstraction fit. Waiting til the abstraction is totally obvious and the need is crystal clear is a good thing.
The flipside (antidote?) of DRY is WET - write everything twice/thrice. More important, IMO, is to abstract only over things I have an actual, demonstrated use case for, usually demonstrated first via duplication, and not speculate about possible future uses I might want. Code written for future use cases we don’t have is so often the code that gets in the way of abstracting the things we do have, and it cracks me up when that happens.
> Waiting til the abstraction is totally obvious and the need is crystal clear is a good thing.
I discovered this after a few early years of my career being a bit of a “best practices” zealot. The thing I say often at work is, “let’s get this shipped to prod so we can start learning all the things we don’t yet know about it.”
If you have any links I'd be very interested to see them. Happy to be proven wrong here, but when most (all?) of the research is based in modelling I don't see how you'd prove causation, especially in such a large system that you can't test with and without the intervention in question.
> but then again most people working at LLM companies are deeply antihuman to start with.
I agreed with you up til this point, but this isn’t true and isn’t called for, and doesn’t strengthen your otherwise good point, in fact it weakens your point to make statements like that. Most people who work at LLM companies, like most people who work at most companies, are making a living and have the same ethics and principles as anyone else. I don’t know where you work or live, but don’t forget the exact same logic and exact same hyperbole is being used to make the same claim about people in tech, and the same claim about Americans and Europeans.
No it's totally called for. This is technology that is literally ruining, destroying, and killing lives. Especially in regards to how US companies are operating with this tech. It's a valid claim, "just following" orders has never been a valid excuse.
These people just care about chasing the bag rather than doing right by their fellow humans. In their mind clearly some humans are more equal than others.
edit: to reiterate, the people choosing to work at these companies care more about becoming millionaires and chasing generational wealth rather than maybe questioning if the machine they are building may be producing terrible outcomes. They can work at any company on this planet easily, stop running coverage for FAANG workers that have always shown disdain for their fellow humans, they choose to work at the misery death machines because they simply do not care about the destruction they have wrought about the world.
I don’t know why people distrust science. Sure, it’s not perfect, and scientists, like all people are subject to human problems. But there’s nothing else in the history of the world with a better track record than science. I feel like the problem is that some politicians spread FUD and prey on people’s insecurities, and unfortunately it tends to work, disproportionately on people with less resources. The problem isn’t science at all; the problem is people and politics.
I know what a linear regression is and how to examine event studies, lol. What you don't understand is that the author is leaning on linguistics to insinuate strong evidence of causation where it doesn't exist. If this was a quant in finance, they'd be out the door in days.
"The problem isn’t science at all; the problem is people and politics."
Please elaborate. I haven’t used entropy balancing or difference in differences, but those articles explain that their purpose is to try to tease out causation. What - exactly - is the linguistic trick, if they actually did use an entropy balanced Poisson regression and difference of differences?
"Teasing out causation" is exactly why this methodology fails. You are confusing the intended purpose of a statistical tool with its real-world validity. No one is questioning what an Entropy-Balanced Poisson Regression or a Synthetic Difference-in-Differences model is designed to do.
The issue is that the authors have profoundly violated the mathematical assumptions required for these tools to actually function. Throwing high-level econometric terms into an abstract does not make the underlying logic scientific, but rather acts as a linguistic tuxedo on a fundamentally broken causal claim.
If you cannot see through economic (and other) confounders that invalidate their approach and their biased statements, I cannot help you. This isn't science. Getting an LLM to run an SDID model and spit out a result doesn't = science.
Ah the age old question: what makes something good? I think you’re already describing it well at a high level; context matters, and there are multiple axes to consider. But that’s extremely vague and doesn’t help you identify or measure quality, so it might be worth listing as many specific axes as you can.
Maybe ask the same question about other things. What makes a good guitar? What makes a good chair? What makes a good airplane? What makes a good book? What makes a good song? What makes good art? Each of these has a long list of very specific goals and concerns. And to help define the boundaries, also ask what makes something bad, and what makes something mediocre.
Code quality starts with functionality. Does it perform the stated requirements? Does it have testing in place to catch breaking changes in functional requirements? That’s the basic stuff that probably isn’t part of “taste”. A lot of code quality goals center around how code changes over time, and beliefs about designing to avoid functional breakage.
For example you can ask things like does the code use minimal dependencies? Is the code organized into clean classes/modules/functions that each have a single clear role? Is the API easy to read, understand, and use? Is the API hard to misuse accidentally? Is all the code easy to read? Is there documentation, and is the documentation useful, and more than a list of contents? Is the code self-documenting? Is the code efficient, both in how it executes, and in its use of code itself? Is the code designed so that it won’t fail when someone runs it with different sized types, or a different compiler or execution environment, or on a different architecture? Is the code surprisingly elegant and fun to use?
Those are just the beginning. There are of course more layers of application-specific and environment-specific and audience-specific qualities. The good news is that quality depends on your own goals, you can decide which aspects of taste matter to you, and ignore the ones that don’t. It’s fine if your taste & goals change over time.
I have so many questions… Since Apple already sells unified memory systems, what is the market opportunity you envision? Do you see Nvidia and Apple as competitors, and how? (And I’m not suggesting they’re not, necessarily, but I want to hear where you’re coming from, and they do have very different markets.) Hasn’t Apple used storage size (RAM & disk) for market segmentation for decades? And how does a machine with 128GB unified mem not potentially cut into some people’s reasons for wanting a 96GB GPU?
Apple offers relatively affordable options for a high-memory workstation that uses unified memory. They previously offered 256/512GB Mac Studios (both discontinued). Because of this they can keep larger models in memory.
BUT you just can't compete with NVidia performance for LLM workloads (mostly inference) for two reasons:
1. The memory bandwidth just can't compete with a 5090 (1800GB/s). The best current Mac is ~900GB/s. That directly caps tokens/sec and might be manageable but there's another problem; and
2. The raw FLOPS just can't compete with even a 5090. It probably needs to natively support FP4/FP8 to at least maintain a number format parity with NVidia. But beside that, NVidia just has more raw FLOPS.
According to Google, an M5 Max does ~70 FP16 TFLOPS while a 5090 does 380. If Apple can close that gap to at least be competitive and also hold larger models in shared VRAM, that would be a competitive advantage and it would directly attack NVidia's market segmentation.
The Mac Studio last came out March last year. So we may get an update in Q3. Many are pinning their hopes on this. But it might not happen until next year. When it was released the M4 was the state of the art and it came with either the M4 Max or M3 Ultra (which, as I understand it, is basically 2 M3s stuck together, kind of). What people are hoping for is an M5 Ultra with >1000GB/s of memory bandwidth, ideally 200+ FP16 TFLOPS and hopefully FP4/FP4 support.
You can chain Mac Studios together into a cluster with TB5 too.
But it's reasonably likely that the next Mac Studio will be only incrementally better than the last generation.
I'm not the person you're replying to, but I wholeheartedly agree with them...
Quick background: doing AI inference requires three things. Lots of memory, lots of memory bandwidth, and of course plenty of compute that has access to that memory.
Quick reference: nVidia 5090 has 1,792 GB/sec bandwidth. 3090 gets about 1000 GB/sec. DGX Spark and AMD 395 whatever get about 275 GB/sec.
Apple M1 Max gets 400GB/sec, M5 Max gets 614GB/sec. Ultra variants get 2x that bandwidth, base variants get 1/2 that bandwidth. However... their compute is rather weak.
Right now, Apple's offerings are juuuuuust fast enough to run dense 27B models at usable speeds at like, 10% of the performance/watt of nVidia. They're world-leading general purpose CPUs but not killer GPUs.
By all accounts, these Windows PCs nVidia is touting seem to have DGX Spark like performance, which is less than impressive. Same with the upcoming AMD AI-oriented consumer stuff.
The other context here is that running your own AI at home is just starting to become feasible in terms of open model availability and the ability to run it at usable speeds. Many are interested in it for reasons of privacy, security, and cost certainty vs. buying tokens.
Since Apple already sells unified memory systems, what
is the market opportunity you envision?
nVidia and AMD can't make their consumer offerings too good at AI, because that risks interfering with their higher-margin data center sales.
(And, let's face it. Even if nVidia did release a 6090 with 64-128GB of memory for an affordable price, consumers wouldn't get their hands on them anyway because people would just start filling data centers with them)
So.
Now you see Apple's opportunity, right? No data center sales to interfere with. No relationship with nVidia or AMD to worry about.
They could choose to make an absolute beast of a home AI machine. The M5 Ultra, if announced, might be that. It's admittedly a niche market, but people are already buying 64GB+ Macs faster than Apple can make them and they're fetching high prices on the used market as well.
The only real questions are if this market is even something Apple would find time to care about, and if they could secure enough DRAM to make a go at it. They are enormous obviously but they're feeling the RAM pinch just like everybody.
They use different technology for their VRAM though. Apple, AMD Strix and NVidia DGX/RTX Spark use LPDDR, whereas discrete cards will be either GDDR or HBM. That directly impacts the memory bandwidth figures. As for compute available, Apple and AMD still have very good figures there for what's essentially a general-purpose iGPU that ships as part of the stock system, rather than a special-purpose piece of dedicated hardware.
The M5 has 16 dedicated ‘Neural Engine’ cores and a ‘Neural accelerator’ in each of its conventional GPU cores. It’s been pretty special-purpose juiced for inference.
When it comes to the very largest models the ANE seems to be only marginally useful for prefill. The M5 Neural Accelerators (NAX) help a lot but at a real cost wrt. power and thermals.
Yep, but Apple products don’t spend most of their time running huge models. They are running lots of little ones all the time, using hardware designed for that.
It seems that you're agreeing with what I wrote above. They ship a general-purpose stock system and tailor their compute offering towards that. Accelerating 'lots of little models' fits naturally into what they offer, in a way that a more compute-intensive design might not.
Yep, I misunderstood your point. Thanks for your patience. In my defense, the 'general purpose system' has a lot of model-inference-specific hardware. But not LLM-specific hardware.
If there's an M5 Ultra it'll be interesting to see what they've optimized it for.
Even if a Mac isn’t the fastest in raw numbers it may be faster if it can load the whole model in its ram (went up to 512 GB before shortages) than a couple 32 GB cards could with the data having to be constantly loaded over PCI-E. Because unified memory means the Apple GPUs can access all 512 GB at full speed.
My understanding is this is the advantage that’s pushing huge Mac Studio demand. Because it was the only way to give GPUs so much memory at price points anywhere near.
Yeah you can do way better once you’re in the 5 digits. But below that Apple had a specific advantage for some.
You're correct about some things but mostly wrong.
Yes, a Mac with 128GB+ will let you load some pretty big models.
However, you're still not going to be able to run them at usable speeds. Here are some M5 Max benchmarks on a Qwen 27B model w/ 290K context.... 12 tokens/sec output.
And that's a 27B model. So yes, a M5 Max 128GB will let you load some pretty big models - can probably fit 120B in there with room left over for context. But the M5 Max still doesn't have the compute to make it practical, at least from an interactive usage standpoint - 120B dense model is going to be like an order of magnitude slower than 27B. You have to understand the computation going on here. LLMs are basically a huge many-to-many operation, and those operations themselves are pretty heavy.
So back to my previous post... you need three things. You need fast memory, you need a lot of it, and you need GPU compute with direct access to that fast memory. The M5 Max has like, 1.5 of the 3.
The M5 Ultra (if it ever exists) could kinda hit all 3, although actually getting your hands on one will be quite the lottery ticket.
My understanding is this is the advantage that’s pushing huge Mac Studio demand.
This is true, but also, people who made this investment found that they're still not very usable for those HUGE models. Don't take my word for it though. Lots of benchmarks out there. r/localllama is pretty active too.
12 tok/s can absolutely be "usable output" depending on what you're doing. I agree though that the 27B dense model often feels slow due to an overall weakness of memory throughput on that particular platform. Most real-world 120B models though will be MoE-based with only a small fraction of active parameters, and these run quite well. Also, dense models can benefit from batching, which is at least marginally viable with Qwen if you stick to shorter contexts and smaller batches.
> If you use it on stuff that you’re pretty good at, it’s not a gamechanger (and if you’re an expert, it’s a minor boost at best).
This was probably true last year, and it’s a common talking point, but I’ve seen too many examples now of deep experts using Claude & Codex in the last year to solve very big problems, and write or rewrite large systems. The experts do complain that the LLMs can sometimes get stuck or go off the rails and they need to pay attention and actively steer. But nobody I know who’s using it is still claiming the LLMs aren’t a game changer, even quite a few people who were staunch holdouts for a long time. I was skeptical myself, for a long time, but had my oh shit moment late last year.
One caveat - to get expert results, you do need to have some experience using LLMs, you need to use it to write plans and design docs, know how to use ‘skills’ and MCPs, use it to review code, and (for now) you need to understand context compaction and when/why to use sub-agents. If you’re a domain expert but an AI noob, it’s less effective than an expert who knows how to use AI and has experience.
One of the biggest problem with humans is we’re wired to spot patterns and draw conclusions and then we have a really hard time seeing and accepting change and updating our mental rules. The LLMs are getting better. They have already gotten better, and they’re going to continue getting better. It’s too early to draw conclusions, and many conclusions people have already declared are out of date and no longer true.
There are some news media that do go slower and take their time, but I think they’re struggling to stay alive. Reuters is still reputable, but they no longer necessarily take a day. The big question is how do we get humanity to prefer slow & correct over fast, and it is even possible? When you hear about an earthquake in Venezuela, how do we stop people from Googling it immediately, and get them to wait for the best most correct story rather than reading whatever’s available now? In the case of natural disasters, I don’t think it’s possible anymore, no matter what case you make. I’m not sure it’s possible with stories like AI distillation either, even if you can absolutely cement the case for slow news. The fact that it’s async/internet now and that first still counts means we (you and I) are still going to give traffic and attention to sites that have the first information on a breaking topic, statistically, despite having a preference for correctness over speed. The one thing we can do is vote with our dollars by subscribing to whatever news media that does a better job than others.
reply