Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's an interesting problem—Tarsier probably isn't the best solution here since it's focused on webpage perception rather than any kind of OCR. But one could try adapting the `format_text` function in tarsier/text_format.py to convert any set of OCR annotations to a whitespace-structured string. Curious to see if that works.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: