zlacker

[parent] [thread] 2 comments
1. philis+(OP)[view] [source] 2026-02-03 16:08:37
The whole argument hinges on one word in your post: arbitrary.

I parse my own HTML I produce directly in a context where I fully control the output. It works fine, but parsing other people’s HTML is a lesson in humility. I’ve also done that, but I did it as a one time thing. I parsed a specific point in time, refusing to change that at any point.

replies(1): >>umanwi+w3
2. umanwi+w3[view] [source] 2026-02-03 16:23:05
>>philis+(OP)
It also hinges on another word: parsing. There are things other than parsing that you might want to do. For example, if you want to count the number of `<hr>` tags in an HTML document, that doesn't require parsing it, and can indeed be done with regex.
replies(1): >>kstrau+zf
◧◩
3. kstrau+zf[view] [source] [discussion] 2026-02-03 17:11:10
>>umanwi+w3
No you can’t. You can have an unescaped <hr> inside a script tag, for example. The best you can do is a simple string search for “<hr>” and hope it’s returning what you think it might be returning. Regexps are not powerful enough to determine whether any particular instance of “<hr>” is actually an HTML tag.

Like, it’s not a matter of cleverness, either. You can’t code around it. It’s simply not possible.

[go to top]