Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Working on restoring punctuation marks in raw YouTube transcripts. Bert model trained as a token classifier works but it doesn’t scale well with more languages. See https://www.appblit.com/scribe for the current solution

Now looking into fine tuning a small LLM like Gemma 2b-it to this task.

Any advice on other LLMs that would work appreciated.

Also bonus if this LLM can run in a browser



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: