The only output from the WASM is to draw to screen. There is no chance of a RCE, or data exfiltration.
As an aside, I originally thought this was going to generate a new font "style" that matched the text. So for example, "once upon a time" would look like a storybook style font or if you wrote something computer science-related, it would look like a tech manual font. I wonder if that's possible.
I was wondering why this was never used for an simpler autocorrect, but i guess that's why.
Also perhaps someone more educated on LLMs could tell me; this wouldn't always be consistent right? Like "once upon a time _____" wouldn't always output the same thing, yes? If so even copying and pasting in your own system using the correct font could change the content.
Has there already been a proposal to add scripting functionality to Unicode itself? Seems to me we're not very far from that anymore...
In that case could you ship a live demo of this that's a web page with the font embedded in the page as a web font, such that Chrome and Firefox users can try it out without installing anything else?
I guess that’s the closest you get to copying.
There's very little code in the world that I wouldn't want to run in a robust sandbox. Low level OS components that manage that sandbox is about it.
You could also use this to make animated fonts. An excuse to hook up a diffusion model next?
Oh, this can't be used for nefarious purposes. What could POSSIBLY go wrong?!
What is the end game here?
It is kind of like a "fractal" attack surface, with increasing surface the "deeper" one looks into it. It is nightmarish from that perspective ...
Last I checked there were about 4-10 TTF bugs discovered and actively exploited per year. I think I heard those stats in 2018 or so. This has been a well known and very commonly exploited attack vector for at least 20 years.
Ideally, I'd like not to execute any kind of arbitrary code when doing something mundane as rendering a font. If that's not possible, then the code could be restricted to someting less than turing complete, e.g. formula evaluation (i.e. lambda calculus) without arbitrary recursion.
The problem is that even sandboxed code is unpredictable in terms of memory and runtime cost and can only be statically analyzed to a limited extent (halting problem and all).
Additionally, once it's there, people will bring in libraries, frameworks and sprawling dependency trees, which will further increase the computing cost and unpredictability of it.
... except that it can happen in non-browser contexts.
Even for browsers, it took 20+ years to arrive at a combination of ugly hacks and standard practices where developers who make no mistakes in following a million arcane rules can mostly avoid the massive day-one security problems caused by JavaScript (and its interaction with other misfeatures like cookies and various cross-site nonsense). During all of which time the "Web platform" types were beavering away giving it more access to more things.
The Worldwide Web technology stack is a pile of ill-thought-out disasters (or, for early, core architectural decisions, not-thought-out-at-all disasters), all vaguely contained with horrendous hackery. This adds to the pile.
> The only output from the WASM is to draw to screen.
Which can be used to deceive the user in all kinds of well-understood ways.
> There is no chance of a RCE, or data exfiltration.
Assuming there are no bugs in the giant mass of code that a font can now exercise.
I used to write software security standards for a living. Finding out that you could embed WASM in fonts would have created maybe two weeks of work for me, figuring out the implications and deciding what, if anything, could be done about them. Based on, I don't know, a hundred similar cases, I believe I probably would have found some practical issues. I might or might not have been able to come up with any protections that the people writing code downstream of me could (a) understand and (b) feasibly implement.
Assuming I'd found any requirements-worthy response, it probably would have meant much, much more work than that for the people who at least theoretically had to implement it, and for the people who had to check their compliance. At one company.
So somebody can make their kerning pretty in some obscure corner case.
Edit—the OP uses this exact use case, Urdu typesetting, to justify WASM in Harfbuzz (video around 6:00); seems like Urdu has really become the posterchild for typographic complexity these days
It's not a bug, it's a feature - a DRM. Your content can now be consumed, but cannot be copied or modified - all without external tools, as long as you embed that TTF somehow.
Which kind of reminds me of a PDF invoices I got from my electricity provider. It looked and printed perfectly fine, but used weird codepoint mapping which resulted in complete garbage when trying to copy any text from it. Fun times, especially when pasting account number to a banking app.
Whether this is good or bad, I have no opinion on. It is "just" another layer of complexity and attack surface at this point. We have programmable shaders, rowhammer, speculative execution bugs, data timing side channels, kernel level BPF scripting, prompt injection and much more. Throwing WASM based font rendering into the mix is just balancing more on top of the pile. After some years in the IT security area, I think there are so many easier ways to compromise systems than these arcane approaches. Grab the data you need from a public AWS bucket or social engineer your access, far easier and cheaper.
For what it's worth, I think embedded WASM is a better idea than rolling your own eco systems for scripting capabilities.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1248876
[2] I know, there are so many edge cases. I put this in the same do not touch bucket as time and names.
[3] https://scripts.sil.org/cms/scripts/page.php?id=cmplxrndexam...
[1] https://www.destroyallsoftware.com/talks/the-birth-and-death...
https://www.adultswim.com/videos/off-the-air
or on Youtube https://www.youtube.com/playlist?list=PLQl8zBB7bPvLWfGCVicg_...
Maybe you meant adding it to OpenType?
"Once upon a time!!!!!!!!!!!!!!!!!!!!!!!!!!!!!SEED42!!!!!??!!!??!"
and 3) actually just allow you to override the suggestions by typing what letters on your own, to be used in future inferences. At that point it'd be a fairly generic auto-complete kind of thing.
But that's due to the possibility model configuration changes on the service end and not relevant here.
It'd be lovely if someone embedded the font in a website form to save us all the trouble of demoing it
EDIT: Nevermind. Using the exact commits you linked give another error (undefined reference to wasm_externref_ref2obj). I give up
As it is, if you go back into a string of !!!!!!!!!! That has been turned into ‘upon a time’, and try to delete the ‘a’, you’ll just be deleting an ! And the string will turn into ‘once upon a tim’.
If you could just keyboard mash to pass entropy to the token sampler, deleting a specific character would alter the generation from that point onwards.
Right?
To me, it's a great reminder that the line between well-sandboxed turing-complete execution environments and messy implementations of decoders for "purely declarative" data formats can be quite blurry.
Said differently, I'd probably trust Harfbuzz/WASM more than the average obscure codec implementation in ffmpeg.
Also, there's a ZMachine interpreter (text adventure player) written in PostScript which can play Zork and some libre games such as Calypso with just GhostScript, the PostScript interpreter most software use to render PostScript files.
Imagine that you download a .odt/docx/pdf form with embedded font in LibreOffice in 2025. You start to type some text... And font start to saturate FPU ports (i.e. div/sqrt) in specific pattern. Meanwhile some tab in browser measures CPU load or port saturation by doing some simple action, and capture every character you typed.
But things like this might be possible (for now): https://gwern.net/dropcap
The 280GB you saw is the Llama3-70B model which is basically chatgpt level (if not better).
Just look at the 4 most recent videos. Maybe start with "Harder Drive: Hard drives we didn't want or need" where he tries to make hard drives out of things that shouldn't be hard drives. This includes: by pinging the entire internet, tetris, and Covid-19 tests. But in truth the absurdity is a deep dive into the nature of how data can be stored and encoded. I think it should encourage people to pursue knowledge for the sake of knowledge and how there are frequently deep insights into seemingly dumb questions, as long as you dig deep enough.
We do these things not because they are hard, but because they are harder drives!At least most if not all ffmpeg decoders and demuxers are fuzzed all the time and any found issue is addressed.
In that case being able to show arbitrary other text would definitely be a hindrance because the scanning software typically looks at the data stored in the database. However I think you don't need a Turing machine to exploit this — you could have a single ligature in a well crafted font produce a full paragraph of text.
Perhaps there's an alternative vector where someone's premade font on a site that doesn't allow font uploading can be exploited to make arbitrary calculations given certain character strings. Maybe bitcoin mining, if you could find a way to phone home with the result
Which is open source (MIT-licensed), the source code is here: https://github.com/microsoft/PowerToys/tree/main/src/modules...
It is written in C#, and uses the Windows.Media.Ocr UWP API to do the actual OCR part: https://learn.microsoft.com/en-us/uwp/api/windows.media.ocr?... – so if your app runs on Windows it can potentially call the same API and get OCR for free
Apple provides OCR through VisionKit ImageAnalyzer API – https://developer.apple.com/documentation/visionkit/imageana... – albeit that is only officially supported to call from Swift (although apparently you can expose it to Objective C if your write a "proxy Swift framework"–a custom Swift framework that wraps the original and adds @objc everywhere–I assume such a proxy framework could be autogenerated using reflection, but I'm not sure if anyone has written a tool that actually does that). There is also the older VNRecognizeTextRequest API which is supported by Objective C, but its OCR quality is inferior.
I'm not sure what the best answer for Linux or Android is. I guess https://github.com/tesseract-ocr/tesseract ?
Of course, back in the 1990s Java and Flash were supposed to be sandboxed. So who knows?
This isn't used as much today with modern large resolutions where we can get decent image quality from just rasterizing the font outline with anti aliasing.
This example, however, is using wasm embedded to ttf fonts which is not the same as ttf hinting byte code.
Or even straight lines in a table. The straight lines from a table boundary get hacked into pieces. You'd think one line would be the ideal presentation for a line, but who are you to judge PDF?
iirc browsers fuzz the precise timing of calls for exactly this reason already?
> Is there scientific proof of above claim such as "WASM sandboxing is pretty good!" ?
I'm not aware of quantitative studies, but just from a design perspective, the surface that a WASM runtime presents seems intrinsically easier to defend than that of, say, the full Unix userspace that ffmpeg instances usually run in.
Anecdotally, many high-profile iOS and Android vulnerabilities originated in some more or less obscure codec implementation.
Would be cool if you could turn up/down the LLM’s temperature by pressing different keys other than just !!!!
Say pressing keyword numbers 0-9
That sounds like an awful idea, too. I think a font file should describe the fonts form, but it should not describe how it is gonna be rendered. That should be up to the render engine of the device that is going to display the font (printer driver, monitor driver...). But I guess this idea is from a time when people were still using bitmap fonts.
Having said that, the "arbitrary code" found in TrueType is not really arbitrary either - it's not supposed to be able to do anything except change the appearance of the font. From a security standpoint, there's no theoretical difference between a WAV and a TTF font - neither can hurt your machine if the loader is bug-free. Practically speaking though, a font renderer that needs to implement a sort of virtual machine is more complex, and therefore more likely to have exploitable bugs, than a WAV renderer that simply needs to swap a few bytes around and shove them at a DAC.
> Usage: Just download llama.ttf (60 MB download, since it's based on the 15M parameter TinyStories-based model demoed above) and use it like you would any other font.
If this font format is successful, then given enough time, it will become legacy. People won't be as vigilant about it, and they won't understand the internals as well. This is why TIFF-based exploits became so common 20-30 years after TIFF's heyday.
If you rasterize the Bezier curve outline of a ttf font at that resolution, you will have very crooked characters without anti aliasing and very blurry with AA.
At the same time the same font files needed to look good on print paper with a very different DPI setting.
It's a compromise between bitmap and outline fonts. Not ideal but it delivered good results on display and on paper at the time.
The hinting engine is not (?) used that much any more with large resolutions where we can comfortably just rasterize the outline with some AA and have good results.
LaTeX subsumed most of the human authoring uses of PS where it was used in academia.
Security wise, Turing completeness doesn't matter[note]. All that really matters is that the implementation of the format is complex. H264 is not Turing complete, but it is complex, and thus a frequent source of vulnerabilities. Conversely you could probably put a toy Brainfuck interpreter in ring0 and, with moderate care, be confident that no malicious Brainfuck code can take over your system.
[note] It matters a little bit if you consider it a "security" problem that you lose any guarantees of how long a file might take to load. A malicious file could infinite loop, and thus deny service. But then again, this isn't restricted to Turing complete formats - a zip bomb can also deny service this way.