Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Straight up impossible for me to search for Isaac Asimov's "I, Robot". Some kinda of input cleaning strips out the I from "I robot", and just searches "Robot". "iRobot" does not get the results. And the search does not accept commas. Just some of the fun that comes with searching I suppose.


I’m using a stop word list on the client side at the moment: https://github.com/typesense/showcase-books-search/blob/01f2...

Working on support for exact matches using quotes, which will also help here.


Word bigrams without stopword removal on the title would help in this case as well.


It also strips quotation marks, so can't search phrases at all.

I'm always sort of surprised by this stuff. Like, isn't 99% of the effort to get access to the database, how to search it efficiently, how to display useful snippets? Why go to all that effort and not take a useful search syntax off the shelf?


OP:

> I built this in about 12 hours as a weekend project, so there might be some lurking issues.

> Yup, exact matching using quotes is on the horizon. Should be available in a few weeks.

> Looks like I’m filtering a little too aggressively which is affecting the results, I’ll take a closer look it in a few hours.


Can someone explain why using quotes is a thing that has to be added? Like, it treats words separately by breaking up the search string into tokens using spaces as a delimiter. Is the OP implementing this himself? I would have imagined there's a standard library for converting search strings into a structured query.


Looks like they are using a blanket stop word list - "I, the, and" etc. Also looks like diacritical folding has been done for accent characters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: