Working on restoring punctuation marks in raw YouTube transcripts. Bert model tr...

Working on restoring punctuation marks in raw YouTube transcripts. Bert model trained as a token classifier works but it doesn’t scale well with more languages. See https://www.appblit.com/scribe for the current solution

Now looking into fine tuning a small LLM like Gemma 2b-it to this task.

Any advice on other LLMs that would work appreciated.

Also bonus if this LLM can run in a browser