Site list: theatlantic.com, newyorker.com, archive.org, smithsonianmag.com, qz.com, nationalgeographic.com, aeon.co, openculture.com, theconversation.com, might.net, theparisreview.org, vanityfair.com, ted.com, popularmechanics.com, laphamsquarterly.org, buzzfeed.com, fivethirtyeight.com, outsideonline.com, thehustle.co, newrepublic.com, foreignpolicy.com, harpers.org, esquire.com, longreads.com, newstatesman.com, lettersofnote.com, gq.com, thewalrus.ca, cjr.org, strongtowns.org, historytoday.com, variety.com, hyperallergic.com, 1843magazine.com, collectorsweekly.com, theamericanscholar.org, nplusonemag.com, bigthink.com, brainpickings.org, thenation.com, theoutline.com, theinformation.com, washingtonmonthly.com, macleans.ca, redherring.com, thenewatlantis.com, prospectmagazine.co.uk, quoteinvestigator.com, theawl.com, airspacemag.com, calvertjournal.com, canada.com, mensjournal.com, torontolife.com, thecorrespondent.com, thecritic.co.uk, britishmuseum.org, nationalgeographic.co.uk, publishersweekly.com, autoweek.com, folksonomy.org, laweekly.com, menshealth.com, rijksmuseum.nl, metmuseum.org, prospect-magazine.co.uk, wunderground.com, agweek.com, banksy.co.uk, banksyfilm.com, minnesotamonthly.com, openlettersmonthly.com
(Again, by order of frequency in front-page stories.)
This and other precentages are based on 35% of stories being unclassified, that is, coming from sites I've not explicitly tagged. Based on some random sampling of that pool, those are most often blogs or corporate sites. My classification for news, science/academic, and programming sites is generally more comprehensive as I'm able to leverage regex matches: "edu" and "ac" for academic, GitHub and GitLab domains for programming, for example, also station call-letter patterns such as [KW][A-Z][A-Z][A-Z]for the US for many general news sites.