Nothing? Neither Go nor the OS require file names to be UTF-8, I believe
You can do something like WTF-8 (not a misspelling, alas) to make it bidirectional. Rust does this under the hood but doesn’t expose the internal representation.
In general, Windows filenames are Unicode and you can always express those filenames by using the -W APIs (like CreateFileW()).
It breaks. Which is weird because you can create a string which isn't valid UTF-8 (eg "\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98") and print it out with no trouble; you just can't pass it to e.g. `os.Create` or `os.Open`.
(Bash and a variety of other utils will also complain about it being valid UTF-8; neovim won't save a file under that name; etc.)
$ cat main.go
package main
import (
"log"
"os"
)
func main() {
f, err := os.Create("\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98")
if err != nil {
log.Fatalf("create: %v", err)
}
_ = f
}
$ go run .
$ ls -1
''$'\275\262''='$'\274'' ⌘'
go.mod
main.goYes, that was my assumption when bash et al also had problems with it.
In Linux, they’re 8-bit almost-arbitrary strings like you noted, and usually UTF-8. So they always have a convenient 8-bit encoding (I.e. leave them alone). If you hated yourself and wanted to convert them to UTF-16, however, you’d have the same problem Windows does but in reverse.
The upshot is that since the values aren’t always UTF-16, there’s no canonical way to convert them to single byte strings such that valid UTF-16 gets turned into valid UTF-8 but the rest can still be roundtripped. That’s what bastardized encodings like WTF-8 solve. The Rust Path API is the best take on this I’ve seen that doesn’t choke on bad Unicode.
If you stuff random binary data into a string, Go just steams along, as described in this post.
Over the decades I have lost data to tools skipping non-UTF-8 filenames. I should not be blamed for having files that were named before UTF-8 existed.
Windows doing something similar wouldn't surprise me at all. I believe NTFS internally stores filenames as UTF-16, so enforcing UTF-8 at the API boundary sounds likely.