zlacker

[parent] [thread] 1 comments
1. mbb70+(OP)[view] [source] 2024-06-23 17:57:10
This is while pretty much all software that extracts structured data from PDFs throws away the text and just OCRs the page. Too many tricks with layouts and fonts.
replies(1): >>knallf+Vx1
2. knallf+Vx1[view] [source] 2024-06-24 12:25:01
>>mbb70+(OP)
I'm always surprised how "generate PDF from Word" turns one word into 10 different print points, all with just a single letter.

Or even straight lines in a table. The straight lines from a table boundary get hacked into pieces. You'd think one line would be the ideal presentation for a line, but who are you to judge PDF?

[go to top]