Well it took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate. Had to:
- write an evaluation pipeline to automate quality testing
- add a query rewriting step to explore more options during search
- add hybrid BM-25+vector search with proper rank fusion
- tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)
- parallelize the search pipeline to decrease wait times
- add moderation
- add a reranker to find best candidates
- add background embedding calculation of user documents
- lots of failure cases to iron out so that the prompt worked for most cases
There's no "just give LLM all the data", it's more complex than that, especially if you want best results and also full control of data (we run all of that using open source models because user data is under NDA)
I already had experience with RAG before so I had a head start. You're right that it's not rocket science, but it's not just "press F to implement the feature" either
P.S. No vibe coding was used. I only used LLM-as-a-judge to automate quality testing when tuning the parameters, before passing it to human QA
- write an evaluation pipeline to automate quality testing
- add a query rewriting step to explore more options during search
- add hybrid BM-25+vector search with proper rank fusion
- tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)
- parallelize the search pipeline to decrease wait times
- add moderation
- add a reranker to find best candidates
- add background embedding calculation of user documents
- lots of failure cases to iron out so that the prompt worked for most cases
There's no "just give LLM all the data", it's more complex than that, especially if you want best results and also full control of data (we run all of that using open source models because user data is under NDA)