zlacker

[return to "Go is still not good"]
1. 0x000x+5o[view] [source] 2025-08-22 12:50:29
>>ustad+(OP)
> If you stuff random binary data into a string, Go just steams along, as described in this post.

> Over the decades I have lost data to tools skipping non-UTF-8 filenames. I should not be blamed for having files that were named before UTF-8 existed.

Umm.. why blame Go for that?

◧◩
2. thomas+us[view] [source] 2025-08-22 13:12:57
>>0x000x+5o
Author here.

What I intended to say with this is that ignoring the problem if invalid UTF-8 (could be valid iso8859-1) with no error handling, or other way around, has lost me data in the past.

Compare this to Rust, where a path name is of a different type than a mere string. And if you need to treat it like a string and you don't care if it's "a bit wrong" (because it's for being shown to the user), then you can call `.to_string_lossy()`. But it's be more hard to accidentally not handle that case when exact name match does matter.

When exactness matters, `.to_str()` returns `Option<&str>`, so the caller is forced to deal with the situation that the file name may not be UTF-8.

Being sloppy with file name encodings is how data is lost. Go is sloppy with strings of all kinds, file names included.

◧◩◪
3. 0x000x+iP[view] [source] 2025-08-22 15:18:03
>>thomas+us
Thanks for your reply. I understand that encoding the character set in the type system is more explicit and can help find bugs.

But forcing all strings to be UTF-8 does not magically help with the issue you described. In practice I've often seen the opposite: Now you have to write two code paths, one for UTF-8 and one for everything else. And the second one is ignored in practice because it is annoying to write. For example, I built the web server project in your other submission (very cool!) and gave it a tar file that has a non-UTF-8 name. There is no special handling happening, I simply get "error: invalid UTF-8 was detected in one or more arguments" and the application exits. It just refuses to work with non-UTF-8 files at all -- is this less sloppy?

Forcing UTF-8 does not "fix" compatibility in strange edge cases, it just breaks them all. The best approach is to treat data as opaque bytes unless there is a good reason not to. Which is what Go does, so I think it is unfair to blame Go for this particular reason instead of the backup applications.

[go to top]