Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well it took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate. Had to:

- write an evaluation pipeline to automate quality testing

- add a query rewriting step to explore more options during search

- add hybrid BM-25+vector search with proper rank fusion

- tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)

- parallelize the search pipeline to decrease wait times

- add moderation

- add a reranker to find best candidates

- add background embedding calculation of user documents

- lots of failure cases to iron out so that the prompt worked for most cases

There's no "just give LLM all the data", it's more complex than that, especially if you want best results and also full control of data (we run all of that using open source models because user data is under NDA)



Sounds like you vibe coded a RAG system in two weeks, which isn't very hard. Any startup can do it.

I've debugged single difficult bugs before for two weeks, a whole feature that takes two weeks is an easy feature to build.


I already had experience with RAG before so I had a head start. You're right that it's not rocket science, but it's not just "press F to implement the feature" either

P.S. No vibe coding was used. I only used LLM-as-a-judge to automate quality testing when tuning the parameters, before passing it to human QA


"did not hallucinate"

Sorry to nitpick, but this is not technically possible no matter how much RAG you throw at it. I assume you just mean "hallucinates a lot less"


You're right, bad wording


whoa, two weeks


@apwell23 while the author didn’t say how s/he measured QA, creating the QA process was literally the first bullet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: