zlacker

[parent] [thread] 8 comments
1. flerch+(OP)[view] [source] 2022-02-18 14:20:33
Disable javascript and the entire article loads on the economist.
replies(1): >>Scound+nd
2. Scound+nd[view] [source] 2022-02-18 15:18:03
>>flerch+(OP)
Lynx is the best reader for the economist.
replies(4): >>rahimn+Wf >>titano+6s >>networ+SC >>bduers+c61
◧◩
3. rahimn+Wf[view] [source] [discussion] 2022-02-18 15:28:36
>>Scound+nd
Back in 2016, The Economist used to block access from lynx. You'd get an error like this (unless you spoofed the user agent to be something other than lynx):

Error 403 You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.

You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.

  Guru Meditation:

   XID: 84740260
     __________________________________________________________________

   Varnish cache server
replies(1): >>Scound+3j
◧◩◪
4. Scound+3j[view] [source] [discussion] 2022-02-18 15:41:10
>>rahimn+Wf
Tbh, I was probably running some clone like bobcat.
◧◩
5. titano+6s[view] [source] [discussion] 2022-02-18 16:22:58
>>Scound+nd
I don't love reading long articles in fixed-width fonts.
replies(1): >>dredmo+yY
◧◩
6. networ+SC[view] [source] [discussion] 2022-02-18 17:14:04
>>Scound+nd
W3M is fine too.
◧◩◪
7. dredmo+yY[view] [source] [discussion] 2022-02-18 19:00:36
>>titano+6s
Then pipeline to a PS/PDF generator.

For most modern Web publishing, this is mostly a matter of finding and extracting the <article> block, as well as metadata (title, byline, dateline).

html-xml-tools is quite useful for this.

I'd created a WaPo extractor that reduced pagesize by about 95%, stripped the nags and paywalls, etc. Endpoint was HTML, but that could just as easily have generated PDF or ePub if I'd wanted.

replies(1): >>titano+i0k
◧◩
8. bduers+c61[view] [source] [discussion] 2022-02-18 19:42:56
>>Scound+nd
Outline works well too:

https://outline.com/jtdYRj

◧◩◪◨
9. titano+i0k[view] [source] [discussion] 2022-02-25 01:57:28
>>dredmo+yY
I applaud people who take advantage of the fact that the internet is still largely machine-readable and hackable.

I am much lazier, but I use "reader mode" to similar effect.

[go to top]