HTML was a digital format, but it wanted to be a generic format for all document types, not just papers, so it contains a lot of extras that a paper format doesn't need.
for research papers, since they share the same structure, we can further separate content from rendering.
for example, if you want to later connect a paper with an AI, do you want to send <div class="abstract"> ... ?
or do some nasty heuristic to extract the abstract? like document. getElementsByClassName("abstract")[0] ?