Hacker Newsnew | past | comments | ask | show | jobs | submit | mezark's commentslogin

We look at how comparative advantage from economics applies to LLM inference - some GPUs are relatively better at FLOPs, others at memory bandwidth. What happens if you let each do what it’s best at?


Huge congrats - and when you look at the latency graphs as well it really shows the value of these specialised systems!


TitanML Takeoff Inference Server demonstrating controlled generation


Drop in replacement for HF's TGI server. The fastest and easiest way to inference LLMs locally

Github: https://github.com/titanml/takeoff Docs: https://docs.titanml.co/docs/titan-takeoff/getting-started Discord: https://discord.gg/83RmHTjZgf


Falcon 7B running real time on CPU


The linked video seems to have no context provided? What is a titan ML server? Is 7B actually that useful? How does the model compare to others? Etc…


Hey there - TitanML is these guys: https://www.titanml.co/ . I think the impressive thing isn't actually whether the model is good (although it is a good model especially when fine-tuned) - but how fast this model runs on CPU with the TitanML server compared with before.


Annoying because they stole my company's name (TitanML - https://www.titanml.co/) Fortunately they haven't trademarked it, but still unideal.


I'm not a lawyer, but if you've been using it commercially, I believe you have the trademark. Theoretically, you could sue them for infringement, although I don't know whether you could claim damages or just get them to stop using it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: