The state of binary compatibility on Linux and how to address it

>>generi+(OP)
As an end user I often patch the glibc version incompatibility away with https://github.com/corsix/polyfill-glibc

    $ ./polyfill-glibc --target-glibc=2.17 /path/to/my-program

This often leads to discovering new version incompatibilities in other libs. But as the article says others usually can be statically compiled.

>>sylwar+p1
One of the features Zig provides is ability to target any glibc version. See https://github.com/ziglang/glibc-abi-tool/ for more details on how this is solved.

>>Jeaye+zb
The configuration of DNS resolution on Linux is quite complicated [1]. Musl just ignores all that. You can build a distro that works with musl, but a static musl binary dropped into an arbitrary Linux system won't necessarily work correctly.

[1]: >>43451861

>>generi+(OP)
There is no distinction between system and program libraries in Linux. We used to pretend there was one before usrmigration, but that was never good to take seriously.

The distro as packager model ensures that everything is mixed together in the filesystem and is actively hostile to external packaging. Vendoring dependencies or static linking improves compatibility by choosing known working versions, but decreases incentive and ability for downstream (or users) to upgrade those dependencies.

The libc stuff in this article is mostly glibc-specific, and you'd have fewer issues targeting musl. Mixing static linking and dlopen doesn't make much sense, as said here[1] which is an interesting thread. Even dns resolution on glibc implies dynamic linking due to nsswitch.

Solutions like Snap, Flatpak, and AppImage work to contain the problem by reusing the same abstractions internally rather than introducing anything that directly addresses the issue. We won't have a clean solution until we collectively abandon the FHS for a decentralized filesystem layout where adding an application (not just a program binary) is as easy as extracting a package into a folder and integrates with the rest of the system. I've worked on this off and on for a while, but being so opinionated makes everything an uphill battle while accepting the current reality is easy.

[1] https://musl.openwall.narkive.com/lW4KCyXd/static-linking-an...

>>BwackN+2e
> Even dns resolution on glibc implies dynamic linking due to nsswitch.

Because, as far as I’ve heard, it borrowed that wholesale from Sun, who desperately needed an application to show off their new dynamic linking toy. There’s no reason they couldn’t’ve done a godsdamned daemon (that potentially dynamically loaded plugins) instead, and in fact making some sort of NSS compatibility shim that does work that way (either by linking the daemon with Glibc, or more ambitiously by reimplementing the NSS module APIs on top of a different libc) has been on my potential project list for years. (Long enough that Musl apparently did a different, less-powerful NSS shim in the meantime?)

The same applies to PAM word for word.

> Mixing static linking and dlopen doesn't make much sense, as said [in an oft-cited thread on the musl mailing list].

It’s a meh argument, I think.

It’s true that there’s something of a problem where two copies of a libc can’t coexist in a process, and that entails the problem of pulling in the whole libc that’s mentioned in the thread, but that to me seems more due to a poorly drawn abstraction boundary than anything else. Witness Windows, which has little to no problem with multiple libcs in a process; you may say that’s because most of the difficult-to-share stuff is in KERNEL32 instead, and I’d say that was exactly my point.

The host app would need to pull in a full copy of the dynamic loader? Well duh, but also (again) meh. The dynamic loader is not a trivial program, but it isn’t a huge program, either, especially if we cut down SysV/GNU’s (terrible) dynamic-linking ABI a bit and also only support dlopen()ing ELFs (elves?) that have no DT_NEEDED deps (having presumably been “statically” linked themselves).

So that thread, to me, feels like it has the same fundamental problem as Drepper’s standard rant[1] against static linking in general: it mixes up the problems arising from one libc’s particular implementation with problems inherent to the task of being a libc. (Drepper’s has much more of an attitude problem, of course.)

As for why you’d actually want to dlopen from a static executable, there’s one killer app: exokernels, loading (parts of) system-provided drivers into your process for speed. You might think this an academic fever dream, except that is how talking to the GPU works. Because of that, there’s basically no way to make a statically linked Linux GUI app that makes adequate use of a modern computer’s resources. (Even on a laptop with integrated graphics, using the CPU to shuttle pixels around is patently stupid and wasteful—by which I don’t mean you should never do it, just that there should be an alternative to doing it.)

Stretching the definitions a little, the in-proc part of a GPU driver is a very very smart RPC shim, and that’s not the only useful kind: medium-smart RPC shims like KERNEL32 and dumb ones like COM proxy DLLs and the Linux kernel’s VDSO are useful to dynamically load too.

And then there are plugins for stuff that doesn’t really want to pass through a bytestream interface (at all or efficiently), like media format support plugins (avoided by ffmpeg through linking in every media format ever), audio processing plugins, and so on.

Note that all of these intentionally have a very narrow waist[2] of an interface, and when done right they don’t even require both sides to share a malloc implementation. (Not a problem on Windows where there’s malloc at home^W^W^W a shared malloc in KERNEL32; the flip side is the malloc in KERNEL32 sucks ass and they’re stuck with it.) Hell, some of them hardly require wiring together arbitrary symbols and would be OK receiving and returning well-known structs of function pointers in an init function called after dlopen.

[1] https://www.akkadia.org/drepper/no_static_linking.html

[2] https://www.oilshell.org/blog/2022/02/diagrams.html

>>generi+(OP)
> shipping software on Linux

That's a surprisingly hard nut to crack when containers won't work for your use case. We found https://github.com/silitics/rugix to work well in that situation.

>>moron4+pf
How should a microkernel run (WASI) WASM runtimes?

Docker can run WASM runtimes, but I don't think podman or nerdctl can yet.

From >>38779803 :

  docker run \
    --runtime=io.containerd.wasmedge.v1 \
    --platform=wasi/wasm \
    secondstate/rust-example-hello

From >>41306658 :

> ostree native containers are bootable host images that can also be built and signed with a SLSA provenance attestation; https://coreos.github.io/rpm-ostree/container/ :

  rpm-ostree rebase ostree-image-signed:registry:<oci image>
  rpm-ostree rebase ostree-image-signed:docker://<oci image>

Native containers run on the host and can host normal containers if a container engine is installed. Compared to an electron runtime, IDK how minimal a native container with systemd and podman, and WASM runtimes, and portable GUI rendering libraries could be.

CoreOS - which was for creating minimal host images that host containers - is now Fedora Atomic is now Fedora Atomic Desktops and rpm-ostree. Silverblue, Kinoite, Sericea; and Bazzite and Secure Blue.

Secureblue has a hardened_malloc implementation.

From https://jangafx.com/insights/linux-binary-compatibility :

> To handle this correctly, each libc version would need a way to enumerate files across all other libc instances, including dynamically loaded ones, ensuring that every file is visited exactly once without forming cycles. This enumeration must also be thread-safe. Additionally, while enumeration is in progress, another libc could be dynamically loaded (e.g., via dlopen) on a separate thread, or a new file could be opened (e.g., a global constructor in a dynamically loaded library calling fopen).

FWIU, ROP Return-Oriented Programming and Gadgets approaches have implementations of things like dynamic header discovery of static and dynamic libraries at runtime; to compile more at runtime (which isn't safe, though: nothing reverifies what's mutated after loading the PE into process space, after NX tagging or not, before and after secure enclaves and LD_PRELOAD (which some go binaries don't respect, for example).

Can a microkernel do eBPF?

What about a RISC machine for WASM and WASI?

"Customasm – An assembler for custom, user-defined instruction sets" (2024) >>42717357

Maybe that would shrink some of these flatpaks which ship their own Electron runtimes instead of like the Gnome and KDE shared runtimes.

Python's manylinux project specifies a number of libc versions that manylinux packages portably target.

Manylinux requires a tool called auditwheel for Linux, delicate for MacOS, and delvewheel for windows;

Auditwheel > Overview: https://github.com/pypa/auditwheel#overview :

> auditwheel is a command line tool to facilitate the creation of Python wheel packages for Linux (containing pre-compiled binary extensions) that are compatible with a wide variety of Linux distributions, consistent with the PEP 600 manylinux_x_y, PEP 513 manylinux1, PEP 571 manylinux2010 and PEP 599 manylinux2014 platform tags.

> auditwheel show: shows external shared libraries that the wheel depends on (beyond the libraries included in the manylinux policies), and checks the extension modules for the use of versioned symbols that exceed the manylinux ABI.

> auditwheel repair: copies these external shared libraries into the wheel itself, and automatically modifies the appropriate RPATH entries such that these libraries will be picked up at runtime. This accomplishes a similar result as if the libraries had been statically linked without requiring changes to the build system. Packagers are advised that bundling, like static linking, may implicate copyright concerns

github/choosealicense.com: https://github.com/github/choosealicense.com

From >>42347468 :

> A manylinux_x_y wheel requires glibc>=x.y. A musllinux_x_y wheel requires musl libc>=x.y; per PEP 600

>>westur+ok
Return oriented programming: https://en.wikipedia.org/wiki/Return-oriented_programming

/? awesome return oriented programming sire:github.com https://www.google.com/search?q=awesome+return+oriented+prog...

This can probably find multiple versions of libc at runtime, too: https://github.com/0vercl0k/rp :

> rp++ is a fast C++ ROP gadget finder for PE/ELF/Mach-O x86/x64/ARM/ARM64 binaries.

>>Ashame+Yc
I am saying that compiler toolchains on Linux should never ever under any circumstances ever rely on anything on the system for compiling. Compiling based on the system global version of glibc is stupid, bad, wrong, and Linus should be ashamed for letting it happen.

It should be trivial for Windows to cross-compile for Linux for any distro and for any ancient version of glibc.

It is not trivial.

Here is a post describing the mountain range of bullshit that Zig had to move to enable trivial cross-compile and backwards targeting. https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace...

Linux is far and away the worst offender out of Linux, Mac, and Windows. By leaps and bounds.

>>manana+6j
> The same applies to PAM word for word.

That's one of the reasons that OpenBSD is rather compelling. BSDAuth doesn't open arbitrary libraries to execute code, it forks and execs binaries so it doesn't pollute your program's namespace in unpredictable ways.

> It's true that there's something of a problem where two copies of a libc can't coexist in a process...

That's the meat of this article. It goes beyond complaining about a relatable issue and talks about the work and research they've done to see how it can be mitigated. I think it's a neat exercise to wonder how you could restructure a libc to allow multi-libc compatibility, but question why anyone would even want to statically link to libc in a program that dlopen's other libraries. If you're worried about a stable ABI with your libc, but acknowledge that other libraries you use link to a potentially different and incompatible libc thus making the problem even more complicated, you should probably go the BSDAuth route instead of introducing both additional complexity and incompatibility with existing systems. I think almost everything should be suitable for static linking and that Drepper's clarification is much more interesting than the rant. Polluting the global lib directory with a bunch of your private dependencies should be frowned upon and hides the real scale of applications. Installing an application shouldn't make the rest of your system harder to understand, especially when it doesn't do any special integration. When you have to dynamically link anyway:

> As for why you’d actually want to dlopen from a static executable, there’s one killer app: exokernels, loading (parts of) system-provided drivers into your process for speed.

If you're dealing with system resources like GPU drivers, those should be opaque implementations loaded by intermediaries like libglvnd. [1] This comes to mind as even more reason why dynamic dependencies of even static binaries are terrible. The resolution works, but it would be better if no zlib symbols would leak from mesa at all (using --exclude-libs and linking statically) so a compiled dependency cannot break the program that depends on it. So yes, I agree that dynamic dependencies of static libraries should be static themselves (though enforcing that is questionable), but I don't agree that the libc should be considered part of that problem and statically linked as well. That leads us to:

> ... when done right they don't even require both sides to share a malloc implementation

Better API design for libraries can eliminate a lot of these issues, but enforcing that is much harder problem in the current landscape where both sides are casually expected to share a malloc implementation -- hence the complication described in the article. "How can we force everything that exists into a better paradigm" is a lot less practical of a question than "what are the fewest changes we'd need to ensure this would work with just a recompile". I agree with the idea of a "narrow waist of an interface", but it's not useful in practice until people agree where the boundary should be and you can force everyone to abide by it.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28...

>>generi+(OP)
> GLIBC is an example of a "system library" that cannot be bundled with your application because it includes the dynamic linker itself. This linker is responsible for loading other libraries, some of which may also depend on GLIBC—but not always.

Running WordPerfect on modern Linux is done by shipping both of those components:

https://github.com/taviso/wpunix

>>eviden+Jd
Debian archives all of our binaries (and source) here:

https://snapshot.debian.org/

Some things built on top of that:

https://manpages.debian.org/man/debsnap https://manpages.debian.org/man/debbisect https://wiki.debian.org/BisectDebian https://metasnap.debian.net/ https://reproduce.debian.net/

>>myk900+lh
For Ubuntu they would only target LTS releases, most likely.

On EL it's easier, now you would just support 2 or 3 of EL7, EL8, and EL9.

As an example of something I use, Xfdtd only officially supports one version of Ubuntu and 2 versions of EL https://www.remcom.com/system-requirements#xfdtd-system-requ...

In practice, it wasn't too hard to get it running on EL9 or Fedora either...

>>foobla+St
The PEP-600 [0] Rationale section touches on this a bit. The basic problem is that there are things beyond glibc that would be nice to use from the environment for a number of reasons (security updates, avoiding clashes between multiple wheels that depend on the same lib, etc.), but since most libs outside of glibc and libstdc++ don't really have an ABI policy and the distros don't necessarily have a policy on what libraries are guaranteed to be present you sort of have to guess and hope for the best. While the initial list in PEP-513 [1] was a pretty good guess, one of the libraries chosen (libcrypt.so.1) got dropped in Fedora 30 and replaced with an ABI incompatible version. Crypto libraries are an example of something that's actually important to keep up to date so I find this rather unfortunate.

[0] https://peps.python.org/pep-0600/

[1] https://peps.python.org/pep-0513/

>>inftec+OT
That's the sane way to tackle it. If you're the vendor, just target the top N (whatever value of N you can cope with).

I don't mean disrespect towards people running Alpine (hi), Arch, or Gentoo, but you wouldn't be running these distros if you aren't ready to handle their quirks.

TFA mostly talks about binary compat. Even if you can get away with statically linking everything, you still have to cope with the mess that is userspace fragmentation: <https://tailscale.com/blog/sisyphean-dns-client-linux>

So yeah, supporting the top N gets you approximately sqrt(N/(N+1))% of the way. (Assuming desktop Linux market share is about 1%.)

>>ecef9-+g51
> Wine is linux only stable abi

https://blog.hiler.eu/win32-the-only-stable-abi/

>>flohof+q11
Just to add some more context. Zig cc is a wrapper around clang. It can handle cross compiling to specific glibc versions. See https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace... I imagine it would help with the glibc problems they are taking about. Glibc tries to provide a backwards compatible abi.

>>guappa+U71
The 2006-engine version of Half-Life 2: Episode 1 runs on Windows 10/11 with no configuration [1], outside of getting Steam to download it. I recall installing The Elder Scrolls IV: Oblivion on a Windows 11 machine, which just needed Directx 9c to run.

[1] https://steamcommunity.com/sharedfiles/filedetails/?id=28643...

>>generi+(OP)
This is a really great article about binary compatibility!

I disagree with their idea for fixing it by splitting up glibc. I think it's a bad idea because it doesn't actually fix the problems that lead to compat breakage, and it's bad because it's harder than it seems.

They cite these compat bugs as part of their reasoning for why glibc should be split up:

- https://sourceware.org/bugzilla/show_bug.cgi?id=29456

- https://sourceware.org/bugzilla/show_bug.cgi?id=32653

- https://sourceware.org/bugzilla/show_bug.cgi?id=32786

I don't see how a single one of these would be fixed by splitting up glibc. If their proposed libdl or libthread were updated and had one of these regressions, it would cause just as much of a bug as if a monolithic libc updates with one of these regressions.

So, splitting up glibc wouldn't fix the issue.

Also, splitting up glibc would be super nasty because of how the threading, loading, and syscall parts of libc are coupled (some syscalls are implemented with deep threading awareness, like the setxid calls, threads need to know about the loader and vice-versa, and other issues).

I think the problem here is how releases are cut. In an ideal world, glibc devs would have caught all three of those bugs before shipping 2.41. Big corpos like Microsoft manage that by having a binary compatibility team that runs All The Apps on every new version of the OS. I'm guessing that glibc doesn't have (as much of) that kind of process.

>>generi+(OP)
Related video about releasing games on linux, i.e. dlopen() all the things https://www.youtube.com/watch?v=MeMPCSqQ-34

>>dpasse+AZ
Should a microkernel implement eBPF and WASM, or, for the same reasons that justify a microkernel should eBPF and most other things be confined or relegated or segregated in userspace; in terms of microkernel goals like separation of concerns and least privilege and then performance?

Linux containers have process isolation features that userspace sandboxes like bubblewrap and runtimes don't.

Flatpaks bypass selinux and apparmor policies and run unconfined (on DAC but not MAC systems) because the path to the executable in the flatpaks differs from the system policy for */s?bin/* and so wouldn't be relabeled with the necessary extended filesystem attributes even on `restorecon /` (which runs on reboot if /.autorelabel exists).

Thus, e.g. Firefox from a signed package in a container on the host, and Firefox from a package on the host are more process-isolated than Firefox in a Flatpak or from a curl'ed statically-linked binary because one couldn't figure out the build system.

Container-selinux, Kata containers, and GVisor further secure containers without requiring the RAM necessary for full VM virtualization with Xen or Qemu; and that is possible because of container interface standards.

Linux machines run ELF binaries, which could include WASM instructions

/? ELF binary WASM : https://www.google.com/search?q=elf+binary+wasm :

mewz-project/wasker https://github.com/mewz-project/wasker :

> What's new with Wasker is, Wasker generates an OS-independent ELF file where WASI calls from Wasm applications remain unresolved.*

> This unresolved feature allows Wasker's output ELF file to be linked with WASI implementations provided by various operating systems, enabling each OS to execute Wasm applications.

> Wasker empowers your favorite OS to serve as a Wasm runtime!

Why shouldn't we container2wasm everything? Because (rootless) Linux containers better isolate the workload than any current WASM runtime in userspace.

>>terinj+A84
Thats an incredibly old version, I think you are going to be out of luck and wonder what you need from it.

That said, a couple things to try though:

Email Dan Helfman <witten@torsion.org> and ask if they have a copy lying around in backups anywhere. They aren't a Debian member any more but incredibly are still posting to the Debian BTS occasionally, as upstream developer of borgmatic.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1056364#10 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1096005#10

Contact the Icculus folks, the CVS server mentioned in debian/copyright of the oldest version of openal on Debian snapshot points to the Icculus server. Won't get you the debian/ directory, but could possibly get you some sort of CVS access.

http://cvs.lokigames.com/ https://icculus.org/

zlacker

The state of binary compatibility on Linux and how to address it