Show HN: Octosphere, a tool to decentralise scientific publishing

>>crimso+(OP)
Are you aware of the current efforts by researchers on Bluesky to build a new researchers platform on ATProto? (Forget the project name at the moment)

If not, same handle over there, I can get you in touch with them. Or hit up Boris, he knows everyone and is happy to make connections

There's also a full day at the upcoming conference on ATProto & scientific related things. I think they com on discourse more (?)

replies(3): >>crimso+j5 >>Johnny+06 >>nosuch+6Z1

>>verdve+t2
Ooh no, please do, but would love to hear more!

replies(1): >>verdve+R7

>>verdve+t2
id also be curious to follow this if you have any links or resources

replies(1): >>verdve+K7

>>Johnny+06
This is probably a good jumping off point

https://discourse.atprotocol.community/t/about-the-atproto-s...

>>crimso+j5
Go chime in and share your work here: https://discourse.atprotocol.community/t/about-the-atproto-s...

That'll get us connected off HN

I think Cosmik is the group I was thinking of that has also put out some initial poc like yourself

>>crimso+(OP)
Integrate them peer review process and you’ve got a disrupter

replies(3): >>crimso+xl >>mlpokn+Mv >>naaski+9w

>>gnarlo+Kj
Right? This is kind of the dream.

>>gnarlo+Kj
Peer review should be disrupted, but doing peer review via social media is not the way to go.

replies(2): >>perchi+KH >>verdve+tk5

>>gnarlo+Kj
Calling it peer review suggests gatekeeping. I suggest no gatekeepind just let any academic post a review, and maybe upvote/downvote and let crowdsourcing handle the rest.

replies(1): >>staple+zz

>>naaski+9w
While I appreciate no gatekeeping, the other side of the coin is gatekeeping via bots (vote manipulation).

Something like rotten tomatoes could be useful. Have a list of "verified" users (critic score) in a separate voting column as anon users (audience score).

This will often serve useful in highly controversial situations to parse common narratives.

replies(1): >>naaski+Mx3

>>mlpokn+Mv
Has a bit of a leg up in that if it's only academics commenting, it would probably be way more usable than typical social media, maybe even outright good.

>>crimso+(OP)
Yes publishing is broken, but academics are the last people to jump onto platforms...they never left email. If you want to change the publishing game, turn publishing into email.

replies(1): >>fghoro+EQ1

>>crimso+(OP)
@criomsoneer: Check out Open Science Network (Bonfire), they are also doing interesting work in this space! https://openscience.network/

>>crimso+(OP)
I am afraid that gatekeeping is partially essential and somewhat desired, as an academic you don't have time to read everything and some sort of quick signals, albeit very flawed, can be useful to stop wasting time reading crappy science. If you don't gatekeep you will get a lot of crappy papers or papers that mention the same thing and it will waste more time from people that wish to get a quick sense of the state of a topic/field from quality work. An open source voting system would be easily abused, so it will end up to be trusting a select service of peer reviewers or agencies. Especially if a paper includes a lot of experiments and figures that can be somewhat complicated or overwhelming. What do think?

replies(2): >>perfmo+jQ1 >>fc417f+PT1

>>crimso+(OP)
Nothing based on DOIs and OCRIDs will ever be properly decentralised.

You need content addressing and cryptographic signatures for that.

replies(1): >>tbrown+6U1

>>Murska+bF1
This is solved by social trust graph algorithms. These allow intersubjective ranking without a central authority.

>>111010+CU
In whose interests would it be for academics to "leave email"?

Theirs? (Personally, I think not.)

>>Murska+bF1
I'm inclined to agree, and yet the past decade of ML on arxiv moving at a breakneck pace seems to be a counterexample. In that case I observe citation "bubbles" where I can follow one good paper up and down the citation graph to find others.

replies(1): >>Murska+b32

>>j-pb+IO1
Email is pretty decentralized without those things.

replies(1): >>j-pb+eW1

>>tbrown+6U1
And it is infamously insecure, full of spam, and struggles with attachments beyond 10mB.

So thank you for bringing it up, it showcases well that a distributed system is not automatically a good distributed system, and why you want encryption, cryptographic fingerprints and cryptographic provenance tracking.

replies(1): >>leland+AY1

>>j-pb+eW1
And yet, it is a constantly used decentralized system which does not require content addressing, as you mentioned. You should elaborate why we need content addressing for a decentralized system instead of saying "10MiB limit + spam lol email fell off". Contemporary usage of technologies you've mentioned don't seem to do much to reduce spam (see IPFS which has hard content addressing). Please, share more.

replies(1): >>j-pb+TK2

>>verdve+t2
Isn't ATProto just a compromised version of Activity Pub, basically designed around an excuse to force all users into a data mining firehose structure like twitter used to have only there's no privacy features or federation for moderation controls?

replies(3): >>southe+Rn2 >>greent+Fp2 >>verdve+0H3

>>fc417f+PT1
I think for smaller software papers or ML learning on arxiv this might work. For larger papers on biomedical or hard-tech, I think it is much less likely. I struggle to keep up with BioRxiv.org as a medical professional, many articles would require 2 hours+ to confidently review as a professional and I would never trust a "public" review algorithm or network to necessarily do a great job. If you allow weekly updates on your topic area, you might get a 100 papers a week of which 90 are likely poor-quality, but who is going to review these? Definitely not me, I cannot judge a 100 papers a week. Granted, probably only 1 or 2 are directly relevant to your work, but even then the time sink is annoying. It is nice if a publisher has done some initial quality check, made sure the written text is succinct, direct and validated and backed up by well-presented data/figures/methods. Even if a totally open social network exists for upvoting/describing papers, I am afraid the need for these publishers will still be there and they will just exist regardless, and it will still be preferred by academics.

Three~five experts specifically asked to review a paper in a controlled environment versus a thousands random scientists or public people (which might be motivated by financial, malicious or other reasons) is probably still the better option. Larger, technically impressive multi-disciplinary papers with 20+ authors are basically impossible to review as individuals, you would like a few experts on the main methods to review it together in harmony with oversight from an reputable vendor/publisher. Such papers are also increasingly common in any biotech/hard-tech field.

replies(1): >>fc417f+9Y2

>>nosuch+6Z1
Disclaimer: i'm rather hostile to ATProto for reinventing the wheel without bringing much value over AP/XMPP/Matrix.

I don't think that's a fair characterization. Most AP implementations famously don't have privacy features: it was by design (and therefore no surprise to us tech folks), but i remember it was quite the scandal when users found out Mastodon instance admins could read users' private messages. A later "scandal" involved participation in the EUNOMIA research project about "provenance tracking" in federated networks [1], which to be fair to conspiracy theorists does sound like an academic front for NSA-style firehose R&D.

That being said, Bluesky is much harder to selfhost and is therefore not decentralized in practice. [2] See also Blacksky development notes. However, Bluesky does bring a very interesting piece to the puzzle which AP carefully ignored despite years of research in AP-adjacent protocols (such as Hubzilla): account portability.

All in all, i'm still siding on the ActivityPub ecosystem because i think it's much more ethical and friendly in all regards, and i'm really sad so many so-called journalists, researchers and leftists jumped ship to Bluesky just because the attraction of "Twitter reborn" (with the same startup nation vibes) was too strong. At least in my circles, i did not meet a single person who mentioned the choice of Bluesky was about UX or features.

But now, i'm slowly warming up to the ATmosphere having a vibrant development community. Much more so than AP. And to be fair to ATProto, it is worse than AP from a centralization standpoint, but at least it's not as bad and complex as the matrix protocol which brought 0 value over AP/XMPP but made implementations 100x more complex and resource-intensive.

[1] https://cordis.europa.eu/project/id/825171/reporting

[2] https://arewedecentralizedyet.online/

replies(1): >>verdve+ZH3

>>nosuch+6Z1
Yes. Nostr and ActivityPub are so easy too, I don't see much advantage to ATProto and so many disadvantages. It's as decentralized as a meme coin, just waiting for the rug pull.

To me something git-like with a peer review UI (a la pull requests) seems far more natural for distributed academic publications than a social media protocol though.

replies(2): >>crimso+yv2 >>verdve+BI3

>>greent+Fp2
So I briefly touched on this in the blog post, but to expand a little... ATProto provides significantly more "batteries included" than ActivityPub in my view - if you use ATProto, it can handle both authentication and identity management, and effectively act as your back-end and CRUD operations (eg, oauth with your PDS, and then write/read from the network for your object creation based on your Lexicon).

ActivityPub, based on my understanding, really doesn't work like that - while you an oauth with your mastodon account, the expectation is you'll be handling identity and back-end bits, and then sharing events across the network (happy to be corrected).

Part of what kicked this off is seeing ATProto's new devrel person at a meetup and finding their vision pretty compelling.

But yes, ActivityPub is more "robust" and decentralised (hence also jankier)

https://andreasthinks.me/posts/octosphere/octosphere.html

>>leland+AY1
If you think email is still in widespread use because it’s doing a good job, rather than because of massive network effects and sheer system inertia, then we’re probably talking past each other - but let me spell it out anyway.

Email “works” in the same sense that fax machines worked for decades: it’s everywhere, it’s hard to dislodge, and everyone has already built workflows around it.

There is no intrinsic content identity, no native provenance, no cryptographic binding between “this message” and “this author”. All of that has to be bolted on - inconsistently, optionally, and usually not at all.

And even ignoring the cryptography angle: email predates “content as a first-class addressable object”. Attachments are in-band, so the sender pushes bytes and the receiver (plus intermediaries) must accept/store/scan/forward them up front. That’s why providers enforce tight size limits and aggressive filtering: the receiver is defending itself against other people’s pushes.

For any kind of information dissemination like email or scientific publishing you want the opposite shape: push lightweight metadata (who/what/when/signature + content hashes), and let clients pull heavy blobs (datasets, binaries, notebooks) from storage the publishing author is willing to pay for and serve. Content addressing gives integrity + dedup for free. Paying ~1$ per DOI for what is essentially a UUID, is ridiculous by comparison.

That decoupling (metadata vs blobs) is the missing primitive in email-era designs.

All of that makes email a bad template for a substrate of verifiable, long-lived, referenceable knowledge. Let's not forget that the context of this thread isn’t “is decentralized routing possible?”, it’s “decentralized scientific publishing” - which is not about decentralized routing, but decentralized truth.

Email absolutely is decentralized, but decentralization by itself isn’t enough. Scientific publishing needs decentralized verification.

What makes systems like content-addressed storage (e.g., IPFS/IPLD) powerful isn’t just that they don’t rely on a central server - it’s that you can uniquely and unambiguously reference the exact content you care about with cryptographic guarantees. That means:

- You can validate that what you fetched is exactly what was published or referenced, with no ambiguity or need to trust a third party.

- You can build layered protocols on top (e.g., versioning, merkle trees, audit logs) where history and provenance are verifiable.

- You don’t have to rely on opaque identifiers that can be reissued, duplicated, or reinterpreted by intermediaries.

For systems that don’t rely on cryptographic primitives, like email or the current infrastructure using DOIs and ORCIDs as identifiers:

- There is no strong content identity - messages can be altered in transit.

- There is no native provenance - you can’t universally prove who authored something without added layers.

- There’s no simple way to compose these into a tamper-evident graph of scientific artifacts with rigorous references.

A truly decentralized scholarly publishing stack needs content identity and provenance. DOIs and ORCIDs help with discovery and indexing, but they are institutional namespaces, not cryptographically bound representations of content. Without content addressing and signatures, you’re mostly just trading one central authority for another.

It’s also worth being explicit about what “institutional namespace” means in practice here.

A DOI does not identify content. It identifies a record in a registry (ultimately operated under the DOI Foundation via registration agencies). The mapping from a DOI to a URL and ultimately to the actual bytes is mutable, policy-driven, and revocable. If the publisher disappears, changes access rules, or updates what they consider the “version of record”, the DOI doesn’t tell you what an author originally published or referenced - it tells you what the institution currently points to.

ORCID works similarly: a centrally governed identifier system with a single root of authority. Accounts can be merged, corrected, suspended, or modified according to organisational policy. There is no cryptographic binding between an ORCID, a specific work, and the exact bytes of that work that an independent third party can verify without trusting the ORCID registry.

None of this is malicious - these systems were designed for coordination and attribution, not for cryptographic verifiability. But it does mean they are gatekeepers in the precise sense that matters for decentralization:

Even if lookup/resolution is distributed, the authority to decide what an identifier refers to, whether it remains valid, and how conflicts are resolved is concentrated in a small number of organizations. If those organizations change policy, disappear, or disagree with you, the identifier loses its meaning - regardless of how many mirrors or resolvers exist.

If the system you build can’t answer “Is this byte-for-byte the thing the author actually referenced or published?” without trusting a gatekeeper, then it’s centralized in every meaningful sense that matters to reproducibility and verifiability.

Decentralised lookup without decentralised authority is just centralisation with better caching.

>>Murska+b32
> many articles would require 2 hours+ to confidently review as a professional

I think ML (and really all other fields) are the same. Skimming a paper never really leaves you certain of how rigorous it is.

I agree that a naive "just add voting" "review" mechanism would not suffice to replace journals. However there's no requirement that the review algorithm be so naive. Looked at differently, what is a journal except for a complicated algorithm for performing reviews?

> I am afraid the need for these publishers will still be there and they will just exist regardless, and it will still be preferred by academics.

Agreed. I doubt publishers are going away any time soon (if ever) regardless of how technically excellent any proposed replacement might be. I still think it's worthwhile to pursue alternatives though.

replies(1): >>Murska+H19

>>staple+zz
I'm not sure anonymous users should be able to join. Arxiv's system of only allowing academic users seems fine for this, although exceptions could be made for industry researchers.

>>nosuch+6Z1
That's what the haters proclaim, it's from a position of bias and/or jealousy

ATProtocol made a decision, based on the other protocols, to put more emphasis on user experience. If you want to build a new social media fabric for everyone, they have to want to use it. AP / Nostr have UX that will never appeal to the masses.

>>southe+Rn2
> But now, i'm slowly warming up to the ATmosphere having a vibrant development community.

I build in the ATmosphere because I want to effect change. AP was hostile, Nostr is for crypto bros. The @dev community is one of the strongest pieces and attractors

One way I like to think about how the protocol is different is that they made a giant event system for the public content and then let anyone plug in anywhere they want

https://atproto.com/articles/atproto-for-distsys-engineers

>>greent+Fp2
It's for more than publishing, it's for science, which involves people collaborating and communicating

>>mlpokn+Mv
It's a peer review platform build on atproto tech (aiui the vision), not to be social media, though I would not be surprised if it has elements of that

Peer review goes beyond the formal process, in the court of IRL. Social media is one place people talk about new research, share their evaluations and insights, and good work gets used and cited more.

Arxiv has been invaluable in starting to change the process, but we need more.

>>fc417f+9Y2
I agree with your points overall. Regarding "what is a journal except for a complicated algorithm for performing reviews"? I think one point is that there is a hard-to-quantify social contract between the journal editors and specialized reviewers, which are partially hand-selected over many years (and opaque). Editors overall do rely on verifiable experts in the field with an established reputation, both publically and privately. Reviewers can have some sort of direct interaction with the editors, coming with opaque trust-verification. Editors also tend to go to scientific meetings as well and do have undocumented or unofficial interactions with scientists (and their favorite reviewers). Now, reviewers might ask their students to review a paper and they sign off on that after a quick skim, but not all of them do and especially if the paper has more caliber/weight to it, they do tend to take it seriously personally.

Another issue when going to a decentralized tool is that I think it should apply some sort of gate-keeping to only allow academics or verified scientists to contribute reviews, but then you also need a way to prevent bias/friend/self-citation network interactions between the academic reviewers, which means you would need to keep good track of them? Not sure how to handle that.

zlacker

Show HN: Octosphere, a tool to decentralise scientific publishing