One thing I didn’t see here that might be hurting your performance is a lack of semantic chunking. It sounds like you’re embedding entire docs, which kind of breaks down if the docs contain multiple concepts. A better approach for recall is using some kind of chunking program to get semantic chunks (I like spacy though you have to configure it a bit). Then once you have your chunks you need to append context to how this chunk relates to the rest of your doc before you do your embedding. I have found anthropics approach to contextual retrieval to be very performant in my RAG systems (https://www.anthropic.com/engineering/contextual-retrieval) you can just use gpt oss 20b as the model for generation of context.
Unless I’ve misunderstood your post and you are doing some form of this in your pipeline you should see a dramatic improvement in performance once you implement this.
hey, author (not op) here. we do do semantic chunking! I think maybe I gave the impression that we don't because of the mention of aggregating context but I tested this with questions that would require aggregating context from 15+ documents (meaning 2x that in chunks), hence the comment in the post!
Is there a way to convert documents into a hierarchical connected graph data structure which references each other similar to how we use personal knowledge tools like Obsidian and ability to traverse this graph? Is GraphRag technique trying to do this exactly?
Not exactly what you’re looking for but Wilson Lin’s search engine creates a graph from the DOM for context. Here’s his write up: https://blog.wilsonl.in/search-engine/
I mean as long as they're not too long I suppose you could use just about any heuristic for grouping sources. Just seems like it would be hard to generate succinct context if you mess it up.
Unless I’ve misunderstood your post and you are doing some form of this in your pipeline you should see a dramatic improvement in performance once you implement this.