<!—- Don't count <hr> this! -—> but do count <hr> this -->
and <!-- <!-- Ignore <ht> this --> but do count <hr> this —->
Now your regex has to include balanced comment markers. Solve thatYou need a context-free grammar to correctly parse HTML with its quoting rules, and escaping, and embedded scripts and CDATA, etc. etc. etc. I don't think any common regex libraries are as powerful as CFGs.
Basically, you can get pretty far with regexes, but it's provably (like in a rigorous compsci kinda way) impossible to correctly parse all valid HTML with only regular expressions.
<!doctype html>
A<!—- Don't count <hr> this! -—> but do count <hr> that -->Z
gets fixed and rendered as <!DOCTYPE html>
<html><head></head><body>A<!--—- Don't count <hr--> this! -—> but do count <hr> that -->Z</body></html>
Another surprise is that <!doctype html>
A<!—- Don't count this! -— but do count that -->Z
gets rewritten to <!DOCTYPE html>
<html><head></head><body>A<!--—- Don't count this! -— but do count that ---->Z</body></html>
Note the insertion of extra `--` minus-hyphens.This is what MDN (https://developer.mozilla.org/en-US/docs/Web/HTML/Guides/Com...) has to say:
Comments start with the string `<!--` and end with the string `-->`, generally with text in between. This text cannot start with the string `>` or `->`, cannot contain the strings `-->` or `--!>`, nor end with the string `<!-`, though `<!` is allowed. [...] The above is true for XML comments as well. In addition, in XML, such as in SVG or MathML markup, a comment cannot contain the character sequence `--`.
Meaning that you can recognize HTML comments with (one branch of) a RegEx—you start wherever you see `<!--` and consume everything up to one of the listed alternatives. No nesting required.
Be it said that I find the precise rules too convoluted for what they do. Especially XML's prohibition on `--` in comments is ridiculous taken on its own. First you tell me that a comment ends with three characters `-->`, and then you tell me I can't use the specific substring `--`, either? And why can't I use `--!>`?
An interesting bit here is that AFAIK the `<!` syntax was used in SGML as one of the alternatives to write a 'lone tag', so instead of `<hr></hr>` or `<hr/>` (XHTML) or `<hr>` (HTML) you could write `<!hr>` to denote a tag with no content. We should have kept this IMO.
*EDIT* On the quoted HTML source you see things like `-—` (hyphen-minus, em-dash). This is how the Vivaldi DevTools render it; my text editor and HN comment system did not alter these characters. I have no idea whether Chrome's rendering engine internally uses these em-dashes or whether it's just a quirk in DevTool text output.