Working on restoring punctuation marks in raw YouTube transcripts.
Bert model trained as a token classifier works but it doesn’t scale well with more languages. See https://www.appblit.com/scribe for the current solution
Now looking into fine tuning a small LLM like Gemma 2b-it to this task.
Any advice on other LLMs that would work appreciated.
Now looking into fine tuning a small LLM like Gemma 2b-it to this task.
Any advice on other LLMs that would work appreciated.
Also bonus if this LLM can run in a browser