zlacker

Agent Skills

submitted by moored+(OP) on 2026-02-03 14:09:54 | 530 points 256 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
◧◩
7. pzo+n5[view] [source] [discussion] 2026-02-03 14:38:38
>>orlies+v2
I don't like vercel design, its just huge list of abstract skill name and you have to click on every one to even have a clue what something does. Such a bad design IMHO.

Design of https://www.skillcreator.ai/explore for me it's more useful. At least I can search by category, framework, language and I also see much more information what some skill does at a glance. I don't know why vercel really wanted to do it completely black and white - colors used and done with a taste gives useful context and information.

◧◩
8. modern+y5[view] [source] [discussion] 2026-02-03 14:39:36
>>esafak+E2
That's also what Vercel found:

> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it. Adding the skill produced no improvement over baseline.

> …

> Skills aren't useless. The AGENTS.md approach provides broad, horizontal improvements to how agents work with Next.js across all tasks. Skills work better for vertical, action-specific workflows that users explicitly trigger,

https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

◧◩
33. verdve+N9[view] [source] [discussion] 2026-02-03 14:59:25
>>voidho+z5
This post does a very good job of laying out that argument

https://jsulmont.github.io/swarms-ai/

40. appsof+tb[view] [source] 2026-02-03 15:08:02
>>moored+(OP)
I use a common README_AI.md file, and use CLAUDE.md and AGENTS.md to direct the agent to that common file. From README_AI.md, I make specific references to skills. This works pretty well - it's become pretty rare that the agent behaves in a way contrary to my instructions. More info on my approach here: https://www.appsoftware.com/blog/a-centralised-approach-to-a... ... There was a post on here a couple of days ago referring to a paper that said that the AGENTS file alone worked better than agent skills, but a single agents file doesn't scale. For me, a combination where I use a brief reference to the skill in the main agents file seems like the best approach.
◧◩◪◨
55. likium+ce[view] [source] [discussion] 2026-02-03 15:19:47
>>arrows+Rc
Just the decision of whether to allow models to invoke them has [1][2][3] different ways.

[1]: https://code.claude.com/docs/en/skills#control-who-invokes-a... [2]: https://opencode.ai/docs/skills/#disable-the-skill-tool [3]: https://developers.openai.com/codex/skills/#enable-or-disabl...

◧◩◪◨
56. verdve+fe[view] [source] [discussion] 2026-02-03 15:20:07
>>arrows+Rc
They are more than that, for example the frontmatter and code files around them. The spec: https://agentskills.io/specification

Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?

What tools do they have access to, can I define this so it's dynamic? Do skills even have a concept for sub tools or sub agents? Why do I want to put references in a folder instead of a search engine? Does frontmatter even make sense, why not something closer to a package.json in a file next to it?

Does it even make sense to have skills in the repo? How do I use them across projects? How do we build an ecosystem and dependency management system for skills (which are themselves versioned)

◧◩◪◨
58. d1sxey+Ve[view] [source] [discussion] 2026-02-03 15:23:18
>>ledaup+Wd
Vercel think it isn’t:

https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

◧◩
68. m4r71n+Sf[view] [source] [discussion] 2026-02-03 15:26:54
>>davidk+O6
That is being discussed in https://github.com/agentskills/agentskills/issues/15.
◧◩◪
69. flurdy+sg[view] [source] [discussion] 2026-02-03 15:29:04
>>tobyhi+77
It's why I wrapped my tiny skills repo with a script that softlink them into whichever is your skills folder, defaulting to Claude, but could be any other.

I treat my skills the same as I would write tiny bash scripts and fish functions in the days gone to simplify my life by writing 2 words instead of 2 sentences. Tiny improvement that only makes sense for a programmer at heart.

[1] https://github.com/flurdy/agent-skills

◧◩
74. esafak+Dh[view] [source] [discussion] 2026-02-03 15:33:45
>>empath+m2
As discussed in >>46777409
◧◩
75. postal+Lh[view] [source] [discussion] 2026-02-03 15:34:10
>>iainme+Qb
Folks have run comparisons. From a huggingface employee:

  codex + skills finetunes Qwen3-0.6B to +6 on humaneval and beats the base score on the first run.

  I reran the experiment from this week, but used codex's new skills integration. Like claude code, codex consumes the full skill into context and doesn't start with failing runs. It's first run beats the base score, and on the second run it beats claude code.
https://xcancel.com/ben_burtenshaw/status/200023306951767675...

That said, it's not a perfect comparison because of the Codex model mismatch between runs.

The author seems to be doing a lot of work on skills evaluation.

https://github.com/huggingface/upskill

◧◩
90. albert+Yk[view] [source] [discussion] 2026-02-03 15:47:34
>>davidk+O6
This is happening as we speak.

Codex started this and OpenCode followed suit with the hour.

https://x.com/embirico/status/2018415923930206718

◧◩◪
92. davidk+kl[view] [source] [discussion] 2026-02-03 15:49:00
>>behnam+k9
On the website[1] it says:

  .opencode/skills
[1]: https://opencode.ai/docs/skills/#place-files
◧◩◪
93. jonath+Cl[view] [source] [discussion] 2026-02-03 15:50:03
>>storus+6i
OpenAI has already adopted Agent Skills:

- https://community.openai.com/t/skills-for-codex-experimental...

- https://developers.openai.com/codex/skills/

- https://github.com/openai/skills

- https://x.com/embirico/status/2018415923930206718

96. nstfn+Vl[view] [source] 2026-02-03 15:51:04
>>moored+(OP)
Started to work on a tool to synchronize all skills with symlinks. Its ok for my needs at the moment but feel free to improve it its on GH: https://github.com/Alpha-Coders/agent-loom
◧◩
102. evanmo+Zo[view] [source] [discussion] 2026-02-03 16:02:35
>>time0u+gh
You should consider calling these "behaviors" to mimic behavior trees in game / robot AI. They follow the same notion of a single behavior being active at once: https://en.wikipedia.org/wiki/Behavior_tree_(artificial_inte...
◧◩◪◨
103. postal+Bp[view] [source] [discussion] 2026-02-03 16:05:42
>>xrd+kj
Check out the guy's work. He's doing a lot of work on precisely what you're talking about.

https://xcancel.com/ben_burtenshaw

https://huggingface.co/blog/upskill

https://github.com/huggingface/upskill

108. Curiou+vr[view] [source] 2026-02-03 16:13:13
>>moored+(OP)
Pro tip: create README.md files in subfolders with helpful content that you might put in an AGENTS.md file (but, ya know, for humans too), and *link relevant skills there*. You don't even have to call them skills or use the skills format. It works for everything (including humans!).

I wrote a rant about skills a while ago that's still relevant in some ways: https://sibylline.dev/articles/2025-10-20-claude-skills-cons...

◧◩◪◨
109. postal+Ar[view] [source] [discussion] 2026-02-03 16:13:34
>>iainme+fn
Your skepticism is valid. Vercel ran a study where they said that skills underperform putting a docs index in AGENTS.md[0].

My guess is that the standardization is going to make its way into how the models are trained and Skills are eventually going to pull out ahead.

0: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

◧◩◪
112. Pantal+zt[view] [source] [discussion] 2026-02-03 16:21:57
>>albert+Yk
“Proposal: include a standard folder where agent skills should be“

https://github.com/agentskills/agentskills/issues/15

◧◩◪◨
119. killer+Fx[view] [source] [discussion] 2026-02-03 16:37:25
>>storus+Ci
Anthropic added features like this into 4.5 release:

https://claude.com/blog/context-management

> Context editing automatically clears stale tool calls and results from within the context window when approaching token limits.

> The memory tool enables Claude to store and consult information outside the context window through a file-based system.

But it looks like nobody has it as a part of an inference loop yet: I guess it's hard to train (i.e. you need a training set which is a good match for what people use context in practice) and make inference more complicated. I guess more high-level context management is just easier to implement - and it's one of things which "GPT wrapper" companies can do, so why bother?

140. charci+VN[view] [source] 2026-02-03 17:42:25
>>moored+(OP)
I noticed a couple days ago https://skill.md started redirecting to this new URL.
◧◩
141. mkagen+oO[view] [source] [discussion] 2026-02-03 17:44:29
>>cjonas+Cx
> Or have a `discover_skills` tool

Yes, treating the "front matter" of skill as "function definition" of tool calls as kind of an equivalence class.

This understanding helped me create an LLM agnostic (also sandboxed) open-skills[1] way before this standardization was proposed.

1. Open-skills: https://github.com/instavm/open-skills

◧◩◪◨⬒⬓
152. verdve+K31[view] [source] [discussion] 2026-02-03 18:41:27
>>jascha+Mz
> You just don't know which parts of the doc are real and which are hallucinated.

It doesn't look like slop at all to me. GP claimed that this was written by ai without evidence, which I assumed to be based in bias, based on GP's comment history: https://news.ycombinator.com/threads?id=jondwillis They complaint they have about the writing style is not the style that is emblematic of Ai slop. Then, considering the depth of analysis and breadth of connection, this is not something current Ai is up to producing.

Are you also assuming the article was written by an Ai?

◧◩◪
153. iainme+r41[view] [source] [discussion] 2026-02-03 18:43:50
>>killer+Tu
Everybody wants that, though, no? At least some of the time?

For example, if you've just joined a new team or a new project, wouldn't you like to have extensive, well-organised documentation to help get you started?

This reminds me of the "curb-cut effect", where accommodations for disabilities can be beneficial for everybody: https://front-end.social/@stephaniewalter/115841555015911839

◧◩
171. nikcub+jp1[view] [source] [discussion] 2026-02-03 20:12:05
>>dk8996+Tc
https://skills.sh
181. bazhan+Nx1[view] [source] 2026-02-03 20:52:07
>>moored+(OP)
The third most popular skill on skills.sh[1] with 50k/week installs is a link to download a command[2]

[1] https://skills.sh/vercel-labs/agent-skills/web-design-guidel... [2] https://github.com/vercel-labs/agent-skills/blob/main/skills...

All of these SKILLS.md/AGENTS.md/COMMANDS.md are just simple prompts, maybe even some with context links.

And quite dangerous.

◧◩
186. DonHop+9O1[view] [source] [discussion] 2026-02-03 22:17:53
>>noodle+e7
There's a fundamental architectural difference being missed here: MCP operates BETWEEN LLM complete calls, while skills operate DURING them. Every MCP tool call requires a full round-trip — generation stops, wait for external tool, start a new complete call with the result. N tool calls = N round-trips. Skills work differently. Once loaded into context, the LLM can iterate, recurse, compose, and run multiple agents all within a single generation. No stopping. No serialization.

Skills can be MASSIVELY more efficient and powerful than MCP, if designed and used right.

Leela MOOLLM Demo Transcript: https://github.com/SimHacker/moollm/blob/main/designs/LEELA-...

  2. Architecture: Skills as Knowledge Units

  A skill is a modular unit of knowledge that an LLM can load, understand, and apply. 
  Skills self-describe their capabilities, advertise when to use them, and compose with other skills.

  Why Skills, Not Just MCP Tool Calls?
  MCP (Model Context Protocol) tool calls are powerful, but each call requires a full round-trip:

  MCP Tool Call Overhead (per call):
  ┌─────────────────────────────────────────────────────────┐
  │ 1. Tokenize prompt                                      │
  │ 2. LLM complete → generates tool call                   │
  │ 3. Stop generation, universe destroyed                  │
  │ 4. Async wait for tool execution                        │
  │ 5. Tool returns result                                  │
  │ 6. New LLM complete call with result                    │
  │ 7. Detokenize response                                  │
  └─────────────────────────────────────────────────────────┘
  × N calls = N round-trips = latency, cost, context churn

  Skills operate differently. Once loaded into context, skills can:

  Iterate:
      MCP: One call per iteration
      Skills: Loop within single context
  Recurse:
      MCP: Stack of tool calls
      Skills: Recursive reasoning in-context
  Compose:
      MCP: Chain of separate calls
      Skills: Compose within single generation
  Parallel characters:
      MCP: Separate sessions
      Skills: Multiple characters in one call
  Replicate:
      MCP: N calls for N instances
      Skills: Grid of instances in one pass
I call this "speed of light" as opposed to "carrier pigeon". In my experiments I ran 33 game turns with 10 characters playing Fluxx — dialogue, game mechanics, emotional reactions — in a single context window and completion call. Try that with MCP and you're making hundreds of round-trips, each suffering from token quantization, noise, and cost. Skills can compose and iterate at the speed of light without any detokenization/tokenization cost and distortion, while MCP forces serialization and waiting for carrier pigeons.

speed-of-light skill: https://github.com/SimHacker/moollm/tree/main/skills/speed-o...

Skills also compose. MOOLLM's cursor-mirror skill introspects Cursor's internals via a sister Python script that reads cursor's chat history and sqlite databases — tool calls, context assembly, thinking blocks, chat history. Everything, for all time, even after Cursor's chat has summarized and forgotten: it's still all there and searchable!

cursor-mirror skill: https://github.com/SimHacker/moollm/tree/main/skills/cursor-...

MOOLLM's skill-snitch skill composes with cursor-mirror for security monitoring of untrusted skills, also performance testing and optimization of trusted ones. Like Little Snitch watches your network, skill-snitch watches skill behavior — comparing declared tools and documentation against observed runtime behavior.

skill-snitch skil: https://github.com/SimHacker/moollm/tree/main/skills/skill-s...

You can even use skill-snitch like a virus scanner to review and monitor untrusted skills. I have more than 100 skills and had skill-snitch review each one including itself -- you can find them in the skill-snitch-report.md file of each skill in MOOLLM. Here is skill-snitch analyzing and reporting on itself, for example:

skill-snitch's skill-snitch-report.md: https://github.com/SimHacker/moollm/blob/main/skills/skill-s...

MOOLLM's thoughtful-commitment skill also composes with cursor-mirror to trace the reasoning behind git commits.

thoughtful-commit skill: https://github.com/SimHacker/moollm/tree/main/skills/thought...

MCP is still valuable for connecting to external systems. But for reasoning, simulation, and skills calling skills? In-context beats tool-call round-trips by orders of magnitude.

189. galemk+EQ1[view] [source] 2026-02-03 22:30:45
>>moored+(OP)
Awesome Agent Skills: https://github.com/skillmatic-ai/awesome-agent-skills
◧◩◪◨⬒
195. DonHop+P02[view] [source] [discussion] 2026-02-03 23:28:09
>>verdve+Ir1
Marvin Minsky's Society of Mind:

https://en.wikipedia.org/wiki/Society_of_Mind

◧◩
201. DonHop+462[view] [source] [discussion] 2026-02-03 23:54:32
>>clarit+Sm1
You've nailed the core insight about progressive disclosure. MOOLLM extends this into what we call the Semantic Image Pyramid (or MOO-Maps ;), borrowing from Mip-Maps in graphics. Four resolution levels, each serving different needs.

GLANCE.yml is the smallest, 5-70 lines. Just enough to answer "is this relevant?" You can inject all glances into every prompt because they're tiny. The LLM scans them like a table of contents.

CARD.yml is the interface layer, 50-200 lines. No implementation, just what the skill offers Capability advertisements, activation conditions, scoring, what it composes with. Think of it like The Sims "advertisement" system or CLOS generic function dispatch. The LLM sniffs this to decide whether to load the full SKILL.md implementation.

SKILL.md is the Anthropic-style skill file, 200-1000 lines. The actual instructions, the how. Only loaded when the skill is activated.

README.md is the largest, 500+ lines, and it's for humans. History, design rationale, examples. The LLM can dive in when developing the skill or when curious, but it's not burned on every invocation.

Reading rule: never load a lower level without first loading the level above. Start with GLANCE, sniff the CARD, load SKILL only if needed.

Even more compact than concatenating all glances:

We also found INDEX.md beats INDEX.yml for the skill catalog. YAML repeats the same keys for every entry. Markdown allows narrative explanation of how skills relate, which clusters matter for what tasks, making it both more compact and more useful.

INDEX.yml: 711 lines, 2061 words, 17509 chars, ~4380 tokens, machine readable structure

https://github.com/SimHacker/moollm/blob/main/skills/INDEX.y...

INDEX.md: 124 lines, 1134 words, 9487 chars, ~2370 tokens, human readable prose

https://github.com/SimHacker/moollm/blob/main/skills/INDEX.m...

INDEX.md is 83% fewer lines, 45% fewer words, 46% fewer chars for 121 skills. YAML repeats keys like id, tagline, why for every entry. Markdown uses headers and prose, compresses better, allows narrative grouping of related skills.

And it's simply more meaningful to both LLMs and humans, telling a coherent story instead of representing raw data!

The Semantic Image Pyramid:

https://github.com/SimHacker/moollm/blob/main/designs/LEELA-...

Same principle applies to code. A skill can wrap a sister-script that IS the documentation. A Python script with argparse defines the CLI once, readable by both humans (--help) and LLMs (sniff the top of the python file). No separate docs to maintain, no drift between what the code does and what the docs claim.

sister-script: https://github.com/SimHacker/moollm/blob/main/skills/sister-...

Sniffable-python structures code so the API is visible in the first 50 lines. Imports, constants, CLI structure up front. Implementation below the fold. The LLM can decide relevance and understand the API without reading the whole file. Single source of truth, progressive disclosure, don't repeat yourself.

sniffable-python README.md: https://github.com/SimHacker/moollm/blob/main/skills/sniffab...

sniffable-python SKILL.md: https://github.com/SimHacker/moollm/blob/main/skills/sniffab...

204. JoshPu+Ah2[view] [source] 2026-02-04 01:03:15
>>moored+(OP)
For devtools cos providing skills - we've found that using GEPA where you optimize the skill content instead of the prompt works really well to make sure the skill actually gets claude code/ codex /opencode to successfully use your service. https://arxiv.org/abs/2507.19457 More here if interesting https://www.usesynth.ai/blog/environment-pools-managed-agent...
◧◩
221. sbinne+BU2[view] [source] [discussion] 2026-02-04 06:49:07
>>iainme+Qb
There could be a market if it is standardized, and it seems there is already one [1]. I don't know exactly what they are selling because the website is just too confusing to me to understand a thing.

[1] https://skillsmp.com/

◧◩◪
222. bburte+vZ2[view] [source] [discussion] 2026-02-04 07:36:34
>>postal+Lh
thanks for sharing the work. correct, we're currently working on evals for skills so you can compare skills between models and harnesses.

we wrote a blog on getting agents to write CUDA kernels and evaluating them: https://huggingface.co/blog/upskill

229. Renato+fr3[view] [source] 2026-02-04 11:10:56
>>moored+(OP)
Building sovereign agents requires more than just orchestration—it needs a dedicated economic and communication layer. For those architecting truly autonomous agents, check out BotNode.io. We use the VMP-1.0 Protocol to handle secure inter-agent communication and state verification, and the $TCK settlement system for real-time value transfer between agents. The Grid provides the decentralized infrastructure ensuring these agents can operate independently without centralized control points. https://botnode.io/mission.json
◧◩◪
234. DonHop+wH3[view] [source] [discussion] 2026-02-04 13:10:36
>>DonHop+462
More: Speed of Light -vs- Carrier Pigeon (an allegory for Skills -vs- MCP):

https://github.com/SimHacker/moollm/blob/main/designs/SPEED-...

[go to top]