Probably more than 100x for inference. Not only are you drastically reducing the...

fasa99 · on March 30, 2024

to me the most exciting thing is that if is training that is speed up on the order of 100x-1000x, a large cluster may be well suited to gradient-descend hyperparameter tuning parameters by LLM training again and again at scale -- this is the first foot in a door towards an AI that iteratively may improve itself

mobicham · on March 30, 2024

LoRA training should benefit from the same speed-up, because the 1-bit weights will be frozen and all you need for both the forward and backward pass is a binary matmul, then maybe cast after to get more stable gradients.