Hacker Newsnew | past | comments | ask | show | jobs | submit | jakobov's commentslogin

Yes this is the central theme in https://codeisforhumans.com/


Yes 100%!

Read "Code is For Humans" for more on the subject https://www.amazon.com/dp/B0CN6PQ42B


Codeisforhumans.com


Dont be mean


Codeisforhumans.com


How much faster (in terms of the number of iterations to a given performance) is training from distillation?


Nice! Can you explain what you mean by "simulate training beyond the number of available tokens"?

Why does using distillation from a larger model simulate training with more tokens?


Surya here from the core Gemma team -- we can think of a distillation loss as learning to model the entire distribution of tokens that are likely to follow the prefix thus far, instead of only the token in the training example. If you do some back of the envelope calculations, we can see that learning to model a larger distribution yields many more bits of information to learn from.


Gotcha. That makes sense. Thanks!

What are the theories as to why this works better than training on a larger quantity of non-simulated tokens?

Is it because the gradient from the non-simulated tokens is too noisy for a small model to model correctly?


Hi, I work on the Gemma team (same as Alek opinions are my own).

Essentially instead of tokens that are "already there" in text, the distillation allows us to simulate training data from a larger model


They are claiming a 25x reduction in power consumption. That can't be right. Anyone understand where this number is coming from?


Comes from here [1]. Basically 100 racks of H vs 8 racks of B.

I think there may be a typo though, I assume this also includes liquid-cooled vs air-cooled.

[1] https://nvdam.widen.net/s/xqt56dflgh/nvidia-blackwell-archit...


Did you read that in the linked article? I couldn’t find it. But maybe due to the better efficiency with regard to the performance boost (5x) and the ability to now use 27 trillion parameters versus 1.7 Trillion, one can presumably finish the same amount of work in 1/25th of the time and bam, reduction in power consumption. As you say, I’m skeptical the max power draw itself is 25x lower.


I think Jensen said something like needing 25x fewer GPUs (vs. A100) to get the same performance, which amounts to essentially the same thing.


It doesn't imply a full 25x reduction in power consumption though, that might "only" go down by 10x.


Vested his rsu and left


And got very bored and unhappy with big company issues. And has the perspective from his time at Tesla to know how things only get worse for creativity at that stage.


Its not a good thing if true. Tech and creative folk have to find ways to stick around or the financial folk fill the leadership and decision making space.


It's a hard thing to manage. Tech orgs of ~20 people are just more fun than tech orgs of 200 people, which are more fun that tech orgs of 20,000 people which.. you get the picture.

You can create and encourage small teams, but then they need to coordinate somehow. Coordination & communication overhead grows exponentially. Then you get all the "no silos" guys and then its all over..


He is one of OpenAI founders. I don’t think he needs to vest anything.


He founded the non profit, not the part that earns money. Non profit founders doesn't have shares.


yeah, he is basically on of the people who got screwed over, but given that he did work for OpenAI he might not thinking about it that way


he was a director at Tesla when the stock 10xed.

The guy has many tens of millions of dollars most likely.


Presumably he was already set for life from his Tesla gig, no?


I usually agree but I honestly believe even before OpenAI he was set for life and he will now care more about how exciting the work is and how much it aligns with his interests/values.


Even when you have a yacht, you wanna get a bigger one :)


Rsu exist before ipo?


OpenAI does not have RSUs. They have shares of future profit instead.


Only 1/4th tho?


Exactly. That is the real reason.


Shameless plug: Code Is for Humans is a book I just published. You can get the ebook for free at CodeIsForHumans.com


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: