Pete Warden and team just published a paper on Moonshine, their speech to text model.
Key features include:
- 1.7x overall speed boost compared to Whisper
- Flexible-sized input window, allowing for more efficient processing of shorter audio clips
- Up to 5x faster performance on 10-second audio clips
- Matches or exceeds Whisper's accuracy
This is exactly right—we now live in a world in which most jobs are knowledge work, and we should look to those who are the most productive (and lazy) knowledge workers: software developers.
Yeah. I’m convinced the current model is just too confusing. But I really wish there were new interaction patterns that took advantage of low latency speech recognition...
ardit33 is right. My experience is that most successful startups are so hungry for talent that they really don't care where you come from or who you know.
Key features include:
- 1.7x overall speed boost compared to Whisper - Flexible-sized input window, allowing for more efficient processing of shorter audio clips - Up to 5x faster performance on 10-second audio clips - Matches or exceeds Whisper's accuracy