das-bikash-dev's comments

das-bikash-dev · 2026-02-26T12:47:25 1772110045

the worktree isolation per agent is clever. i've been running claude code as my sole dev partner on a production platform (10 repos) and the biggest unlock was treating context as infrastructure — curated reference docs the agent reads on-demand rather than dumping everything into context. re: the longevity question in the thread — i think the orchestration layer stays relevant as long as you're coordinating across repos/services, not just within a single codebase.

das-bikash-dev · 2026-02-26T12:25:51 1772108751

the DAG decomposition approach is interesting — curious how it handles goals that span multiple services/repos. i build a multi-service platform solo with claude code and the hardest part isn't the coding, it's knowing which files across which repos need to change for a given goal. do you see sgai supporting multi-repo goals, or is it scoped to single-repo for now?

ucirello · 2026-02-26T16:43:28 1772124208

author here! It supports multi-repository. You would need to create a directory with both git repositories cloned in, and save the GOAL.md at the parent. This UX could use some polish, for sure. It works, but it needs this extra step.

das-bikash-dev · 2026-02-26T10:56:05 1772103365

the context isolation approach is smart — cascading drift between agents is a real problem. i run 10 microservices with claude code and solved a similar issue by maintaining curated reference docs that agents read on-demand per task area instead of loading everything. the model escalation on failure (haiku → sonnet) is a nice touch too. do you find the lancedb memory layer actually helps with repeated similar tasks, or is it more useful for the code knowledge graph side?

das-bikash-dev · 2026-02-24T19:21:12 1771960872

Interesting to see the evolution mapped out like this. For those building on top of these models (RAG systems, agent frameworks), the real inflection point wasn't just model count but the shift from completion-only to reasoning and structured output capabilities. Are you planning to add annotations for capability changes alongside release dates?

das-bikash-dev · 2026-02-24T19:19:57 1771960797

How does Emdash handle state management when running multiple agents on the same codebase? Particularly interested in how you prevent conflicts when agents are making concurrent modifications to dependencies or config files. Also, does it support custom agent wrappers, or do you require the native CLI?

onecommit · 2026-02-24T19:26:28 1771961188

Thanks for your questions! You can separate the agents in Emdash by running them on separate git worktrees so they can do concurrent modifications without interfering. We don't support custom agent wrappers currently, interesting. Have you written your own? What is your use case for them over native CLIs?

esafak · 2026-02-24T19:25:18 1771961118

> Each agent runs as a task in its own git worktree

If you're talking about shared services, that's another matter.

das-bikash-dev · 2026-02-24T13:53:39 1771941219

This matches my experience. I work across a multi-repo microservice setup with Claude Code and the .env file is honestly the least of it.

The cases that bite me:

1. Docker build args — tokens passed to Dockerfiles for private package installs live in docker-compose.yml, not .env. No .env-focused tool catches them.

2. YAML config files with connection strings and API keys — again, not .env format, invisible to .env tooling.

3. Shell history — even if you never cat the .env, you've probably exported a var or run a curl with a key at some point in the session.

The proxy/surrogate approach discussed upthread seems like the only thing that actually closes the loop, since it works regardless of which file or log the secret would have ended up in.

das-bikash-dev · 2026-02-24T13:46:05 1771940765

The multi-agent budget problem you're describing gets even harder when the services are heterogeneous. In a RAG pipeline, a single user query might hit: query analysis (LLM call), embedding generation (different model/pricing), reranking (yet another model), and response generation (LLM call) — each potentially in a different process.

Per-call monkey-patching sees each call in isolation. What I ended up doing was a trace-based approach: every request gets a trace ID, each service appends cost spans asynchronously, and a separate enrichment step aggregates the total. The hard part was deduplication — when service A reports an aggregate cost and service B reports the individual calls that compose it, you need to reconcile or you double-count.

Your atomic disk writes for halt state is a nice pattern. I went with fire-and-forget (never block the request path, accept eventual consistency on cost data) but that means you can't do hard enforcement mid-request like AgentBudget does.

tenpa0000 · 2026-02-24T13:56:58 1771941418

The deduplication problem is the part I haven't worked out cleanly. The hierarchy in veronica-core sidesteps it as long as you declare parent-child relationships upfront — B's spend rolls directly into A's ceiling without a separate aggregation step. But in a dynamic pipeline where you don't know the call graph until runtime, that assumption breaks. The fire-and-forget tradeoff makes sense. I went with blocking enforcement because the original use case was preventing runaway agents, not auditing after the fact. For RAG you're probably right that eventual consistency is the better fit — you care more about the trace than cutting off a half-finished response.

das-bikash-dev · 2026-02-24T09:56:57 1771927017

Re: when to add a WebSocket gateway vs keeping it in the monolith —

I've built multi-channel chat infrastructure and the honest answer is: keep the monolith until you have a specific scaling bottleneck, not a theoretical one.

One pattern that helped was normalizing all channel-specific message formats into a single internal message type early. Each channel adapter handles its own quirks (some platforms give you 3 seconds to respond, others 20, some need deferred responses) but they all produce the same normalized message that the core processing pipeline consumes. This decoupling is what made it possible to split later without rewriting business logic.

On Redis pub/sub specifically: for a solo dev, skip it until you actually have multiple server instances that need to share state. A single process with WebSocket sessions in memory is fine for early users. The complexity cost of pub/sub isn't worth it until you need horizontal scaling or have a separate worker process pushing messages.

What's your current message volume like? That usually determines timing better than architecture diagrams.

JohannaWeb · 2026-02-26T06:19:19 1772086759

This is super helpful — thank you. And I agree with the “pressure before architecture” principle.

Right now Falcon is still very early — message volume is basically zero outside of local testing. The service split isn’t driven by traffic yet, it’s more about separating identity/trust from messaging so I don’t entangle community membership logic with transport.

The internal normalization point you mentioned is something I’m trying to do early: the goal is a single internal message/event model that adapters (WebSocket, future federation, etc.) translate into, so the core pipeline stays stable if/when the runtime topology changes.

On Redis/pub-sub: totally fair. I’m not running multi-instance yet. JetStream is more experimental at this stage — mostly exploring how identity-aware events propagate, not solving scale today.

das-bikash-dev · 2026-02-24T09:52:24 1771926744

Nice project, especially given the VRAM constraints. A few things I've learned building production RAG that might help:

1. Separate your query analysis from retrieval. A single LLM call can classify the query type, decide whether to use hybrid search, and pick search parameters all at once. This saves a round-trip vs doing them sequentially.

2. If you add BM25 alongside vector search, the blend ratio matters a lot by query type. Exact-match queries need heavy keyword weighting, while conceptual questions need more embedding weight. A static 50/50 split leaves performance on the table.

3. For your evaluator/generator being the same model — one practical workaround is to skip LLM-as-judge evaluation entirely and use a small cross-encoder reranker between retrieval and generation instead. It catches the cases where vector similarity returns semantically related but not actually useful chunks, and it gives you a relevance score you can threshold on without needing a separate evaluation model.

4. Consider a two-level cache: exact match (hash the query, short TTL) plus a semantic cache (cosine similarity threshold on the query embedding, longer TTL). The semantic layer catches "how do I X" vs "what's the way to X" without hitting the retriever again.

What model are you using for generation on the 8GB? That constraint probably shapes a lot of the architecture choices downstream.