Show HN: GlyphLang – An AI-first programming language

submitted by goose0+(OP) on 2026-01-10 23:46:58 | 44 points 27 comments
[source] [go to bottom]

While working on a proof of concept project, I kept hitting Claude's token limit 30-60 minutes into their 5-hour sessions. The accumulating context from the codebase was eating through tokens fast. So I built a language designed to be generated by AI rather than written by humans.

GlyphLang

GlyphLang replaces verbose keywords with symbols that tokenize more efficiently:

  # Python
  @app.route('/users/<id>')
  def get_user(id):
      user = db.query("SELECT * FROM users WHERE id = ?", id)
      return jsonify(user)

  # GlyphLang
  @ GET /users/:id {
    $ user = db.query("SELECT * FROM users WHERE id = ?", id)
    > user
  }

  @ = route, $ = variable, > = return. Initial benchmarks show ~45% fewer tokens than Python, ~63% fewer than Java.

In practice, that means more logic fits in context, and sessions stretch longer before hitting limits. The AI maintains a broader view of your codebase throughout.

Before anyone asks: no, this isn't APL with extra steps. APL, Perl, and Forth are symbol-heavy but optimized for mathematical notation, human terseness, or machine efficiency. GlyphLang is specifically optimized for how modern LLMs tokenize. It's designed to be generated by AI and reviewed by humans, not the other way around. That said, it's still readable enough to be written or tweaked if the occasion requires.

It's still a work in progress, but it's a usable language with a bytecode compiler, JIT, LSP, VS Code extension, PostgreSQL, WebSockets, async/await, generics.

Docs: https://glyphlang.dev/docs

GitHub: https://github.com/GlyphLang/GlyphLang

NOTE: showing posts with links only show all posts

>>goose0+(OP)
Arguably, math notation and set theory already has everything that we need.

For example see this prompt describing an app: https://textclip.sh/?ask=chatgpt#c=XZTNbts4EMfvfYqpc0kQWpsEc...

>>goose0+(OP)
Funny, I've been noodling on something that goes the other direction - avoiding symbols as much as possible and trying to use full english words.

Very underbaked but https://github.com/jaggederest/locque

>>goose0+(OP)
What about the cost of the millions of tokens you have to spend to prompt the LLM to understand your bespoke language with manuals and tutorials and examples and stack overflow discussions and the source code to the compiler, added to every single prompt, that it totally forgets after each iteration?

It already knows python and javascript and markdown and yaml extremely well, so it requires zero tokens to teach it those languages, and doesn't need to be completely taught a new language it's never seen before from the ground up each prompt.

You are treating token count as the only bottleneck, rather than comprehension fidelity.

Context window management is a real problem, and designing for generation is a good instinct, but you need to design for what LLMs are already good at, not design a new syntax they have to learn.

jaggederest's opposite approach (full English words, locque) is actually more aligned with how LLMs work -- they're trained on English and understand English-like constructs deeply.

noosphr's comment is devastating: "Short symbols cause collisions with other tokens in the LLMs vocabulary." The @ in @ GET /users/:id activates Python decorator associations, shell patterns, email patterns, and more. The semantic noise may outweigh the token savings.

Perl's obsessive fetish for compact syntax, sigils, punctuation, performative TMTOWTDI one-liners, to the point of looking like line noise, is why it's so terribly designed and no longer relevant or interesting for LLM comprehension and generation.

I think the ideal syntax for LLM language understanding and generation are markdown and yaml, with some python, javascript, and preferably typescript thrown in.

As much as I have always preferred json to yaml, it is inarguably better for LLMs. It beats json for llms because it avoids entropy collapse, has less syntax, leaves more tokens and energy for solving problems instead of parsing and generating syntax! Plus, it has comments, which are a game changer for comprehension, in both directions.

https://x.com/__sunil_kumar_/status/1916926342882594948

>sunil kumar: Changing my model's tool calling interface from JSON to YAML had surprising side effects.

>Entropy collapse is one of the biggest issues with GRPO. I've learned that small changes to one's environment can have massive impacts on performance. Surprisingly, changing from JSON to YAML massively improved generation entropy stability, yielding much stronger performance.

>Forcing a small model to generate properly structured JSON massively constrains the model's ability to search and reason.

YAML Jazz:

https://github.com/SimHacker/moollm/blob/main/skills/yaml-ja...

YAML Jazz: Why Comments Beat Compression

The GlyphLang approach treats token count as THE bottleneck. Wrong. Comprehension fidelity is the bottleneck.

The LLM already knows YAML from training. Zero tokens to teach it. Your novel syntax costs millions of tokens per context window in docs, examples, and corrections.

Why YAML beats JSON for LLMs:

Sunil Kumar (Groundlight AI) switched from JSON to YAML for tool calling and found it "massively improved generation entropy stability."

  "Forcing a small model to generate properly structured JSON 
   massively constrains the model's ability to search and reason."

JSON pain:

  Strict bracket matching {}[]
  Mandatory commas everywhere  
  Quote escaping \"
  NO COMMENTS ALLOWED
  Rigid syntax = entropy collapse

YAML wins:

  Indentation IS structure
  Minimal delimiters
  Comments preserved
  Flexible = entropy preserved

The killer feature: comments are data.

  timeout: 30  # generous because API is flaky on Mondays
  retries: 3   # based on observed failure patterns

The LLM reads those comments. Acts on them. JSON strips this context entirely.

On symbol collision: noosphr nails it. Short symbols like @ activate Python decorators, shell patterns, email patterns simultaneously. The semantic noise may exceed the token savings.

Perl's syntax fetish is why it's irrelevant for LLM generation. Dense punctuation is anti-optimized for how transformers tokenize and reason.

The ideal LLM syntax: markdown, yaml, typescript. Languages it already knows cold.

>>goose0+(OP)
Great work!

> In practice, that means more logic fits in context, and sessions stretch longer before hitting limits. The AI maintains a broader view of your codebase throughout.

This is one of those 'intuitions' that I've also had. However, I haven't found any convincing evidence for or against it so far.

In a similar vein, this is why `reflex`[0] intrigues me. IMO their value prop is "LLM's love Python, so let's write entire apps in python". But again, I haven't seen any hard numbers.

Anyone seen any hard numbers to back this?

[0] https://github.com/reflex-dev/reflex

>>goose0+(OP)
Instead of making up new languages, just clean up code in old programming languages so it doesn't smell so bad! ;)

Sniffable Python: useful for Anthropic skill sister scripts, and in general.

https://github.com/SimHacker/moollm/tree/main/skills/sniffab...

>>p0w3n3+7g1
Looks like my tokenization review method was incorrect - honestly a little embarrassing on my part. I think it would have been a lot longer before I discovered it, so thanks for the comment!

I did just go through and ran equivalent code samples in the GlyphLang repo (vs the sample code I posted that I'm assuming you ran) through tiktoken and found slightly lower percentages, but still not insignificant: on average 35% fewer than Python and 56% fewer than Java. I've updated the README with the corrected figures and methodology if you want to check: https://github.com/GlyphLang/GlyphLang/blob/main/README.md#a...

zlacker

Show HN: GlyphLang – An AI-first programming language