the actual paper content format should be separated from its rendering.
i.e. it should contain abstract, sections, equations, figures, citations etc. but it shouldn't have font sizes, layout etc.
the viewer platforms then should be able to style the content differently.
They are converting to HTML to make the content more accessible. Accessibility in this context means a11y, in effect ”more accessible” equates to ”more compatible with screen readers”.
While PDF documents can be made accessible, it is way easier to do it in HTML, where browsers build an actual AOM (accessibility object model) tree and expose it to screen readers.
>it should contain abstract, sections, equations, figures, citations etc.
So <article>, <section>, <math>, <figure>, <cite>, etc.
HTML was explicitly designed to semantically represent scientific documents. [1]
”HTML documents represent a media-independent description of interactive content. HTML documents might be rendered to a screen, or through a speech synthesizer, or on a braille display. To influence exactly how such rendering takes place, authors can use a styling language such as CSS.” [2]
1: https://html.spec.whatwg.org/multipage/introduction.html#bac...
2: https://html.spec.whatwg.org/multipage/introduction.html#:~:...