Even for English the exact tweaking of line breaking and hyphenation is a problem that requires manual intervention from time to time. In mathematics research papers it’s not uncommon to see symbols that haven’t yet made it into Unicode. Look at the state of text on the web and you’ll encounter all these problems; even Google Docs gave in and now renders to a canvas.
PDF’s Unicode handling is indeed a big mess but it does have the ability to associate any glyph with an arbitrary Unicode string, for text extraction purposes, so there’s nothing to stop the program that generates the PDF from mapping the fi ligature glyph to the to-character string “fi”.
Yes, if you don't embed your fonts (or at least their metrics) in the PDF layout is less deterministic and will shift from font to font. The point is that we can embed modern fonts in PDF, the virtual printer has upgraded support for that, but for all sorts of reasons some of the tools that build PDF are still acting like they are "printing" to the lowest common denominator and using legacy EBCDIC ligature encodings in 2024. (Fun fact: Unicode's embeddings of the ligatures are somewhat closer to some classic EBCDIC report printing code pages than Extended ASCII's equivalents because there was a larger backwards compatibility gulf there.)