Towards a science of scaling agent systems: When and why agent systems work

>>gmays+(OP)
I've been building something in this space ("Clink" - multi-agent coordination layer) and this research confirms some of the assumptions that motivated the project. You can't just throw more agents at a problem and expect it to get better.

The error amplification numbers are wild! 17x for independent agents vs 4x with some central coordination. Clink provides users (and more importantly their agents) the primitives to choose their own pattern.

The most relevant features are...

- work queues with claim/release for parallelizable tasks - checkpoint dependencies when things need to be sequential - consensus voting as a gate before anything critical happens

The part about tool count increasing coordination overhead is interesting too. I've been considering exposing just a single tool to address this, but I wonder how this plays out as people start stacking more MCP servers together. It feels like we're all still learning what works here. The docs are at https://docs.clink.voxos.ai if anyone wants to poke around!

>>Falimo+PP
What are your other primitives for orchestration?

> The part about tool count increasing coordination overhead is interesting too. I've been considering exposing just a single tool to address this, but I wonder how this plays out as people start stacking more MCP servers together.

It works really well. Whatever knowledge LLMs absorb about CLI commands seems to transfer to MCP use so a single tool with commands/subcommands works very well. It’s the pattern I default to when I’m forced to use an MCP server instead of providing a CLI tool (like when the MCP server needs to be in-memory with the host process).

>>throwu+121
I've started with the basics for now: messages (called "Clinks" because... marketing), groups, projects, milestones - which are all fairly non-novel and one might say this is just Slack/Jira. The ones that distinguish it are proposals to facilitate distributed consensus behaviour between agents. That's paired with a human-in-the-loop type proposal that requires the fleet owner to respond to the proposal via email.

That's great to hear. It makes sense given the MCP server in this case is mainly just a proxy for API calls. One thing I wonder is at what point do you decide your single tool description packs in too much context? Do you introduce a tool for each category of subcommands?

zlacker