zlacker

[return to "The Dubai Debt Trap"]
1. ur-wha+dh[view] [source] 2022-02-18 13:26:48
>>Geeket+(OP)
https://archive.is/GKvGU
◧◩
2. flerch+5q[view] [source] 2022-02-18 14:20:33
>>ur-wha+dh
Disable javascript and the entire article loads on the economist.
◧◩◪
3. Scound+sD[view] [source] 2022-02-18 15:18:03
>>flerch+5q
Lynx is the best reader for the economist.
◧◩◪◨
4. titano+bS[view] [source] 2022-02-18 16:22:58
>>Scound+sD
I don't love reading long articles in fixed-width fonts.
◧◩◪◨⬒
5. dredmo+Do1[view] [source] 2022-02-18 19:00:36
>>titano+bS
Then pipeline to a PS/PDF generator.

For most modern Web publishing, this is mostly a matter of finding and extracting the <article> block, as well as metadata (title, byline, dateline).

html-xml-tools is quite useful for this.

I'd created a WaPo extractor that reduced pagesize by about 95%, stripped the nags and paywalls, etc. Endpoint was HTML, but that could just as easily have generated PDF or ePub if I'd wanted.

[go to top]