This is while pretty much all software that extracts structured data from PDFs throws away the text and just OCRs the page. Too many tricks with layouts and fonts.
>>mbb70+(OP)
I'm always surprised how "generate PDF from Word" turns one word into 10 different print points, all with just a single letter.
Or even straight lines in a table. The straight lines from a table boundary get hacked into pieces. You'd think one line would be the ideal presentation for a line, but who are you to judge PDF?