zlacker

[return to "What's up with all those equals signs anyway?"]
1. ruhith+Vh[view] [source] 2026-02-03 11:56:40
>>todsac+(OP)
The real punchline is that this is a perfect example of "just enough knowledge to be dangerous." Whoever processed these emails knew enough to know emails aren't plain text, but not enough to know that quoted-printable decoding isn't something you hand-roll with find-and-replace. It's the same class of bug as manually parsing HTML with regex, it works right up until it doesn't, and then you get congressional evidence full of mystery equals signs.
◧◩
2. lvncel+Ko[view] [source] 2026-02-03 12:40:16
>>ruhith+Vh
> It's the same class of bug as manually parsing HTML with regex, it works right up until it doesn't

I'm sure you already know this one, but for anyone else reading this I can share my favourite StackOverflow answer of all time: https://stackoverflow.com/a/1732454

◧◩◪
3. umanwi+Z01[view] [source] 2026-02-03 15:59:20
>>lvncel+Ko
Funny how differently people can perceive things. That's my least favorite SO answer of all time, and I cringe every time I see it.

It's a very bad answer. First of all, processing HTML with regex can be perfectly acceptable depending on what you're trying to do. Yes, this doesn't include full-blown "parsing" of arbitrary HTML, but there are plenty of ways in which you might want to process or transform HTML that either don't require producing a parse tree, don't require perfect accuracy, or are operating on HTML whose structure is constrained and known in advance. Second, it doesn't even attempt to explain to OP why parsing arbitrary HTML with regex is impossible or poorly-advised.

The OP didn't want his post to be taken over by someone hamming it up with an attempt at creative writing. He wanted a useful answer. Yes, this answer is "quirky" and "whimsical" and "fun" but I read those as euphemisms for "trying to conscript unwilling victims into your personal sense of nerd-humor".

◧◩◪◨
4. philis+k31[view] [source] 2026-02-03 16:08:37
>>umanwi+Z01
The whole argument hinges on one word in your post: arbitrary.

I parse my own HTML I produce directly in a context where I fully control the output. It works fine, but parsing other people’s HTML is a lesson in humility. I’ve also done that, but I did it as a one time thing. I parsed a specific point in time, refusing to change that at any point.

◧◩◪◨⬒
5. umanwi+Q61[view] [source] 2026-02-03 16:23:05
>>philis+k31
It also hinges on another word: parsing. There are things other than parsing that you might want to do. For example, if you want to count the number of `<hr>` tags in an HTML document, that doesn't require parsing it, and can indeed be done with regex.
◧◩◪◨⬒⬓
6. kstrau+Ti1[view] [source] 2026-02-03 17:11:10
>>umanwi+Q61
No you can’t. You can have an unescaped <hr> inside a script tag, for example. The best you can do is a simple string search for “<hr>” and hope it’s returning what you think it might be returning. Regexps are not powerful enough to determine whether any particular instance of “<hr>” is actually an HTML tag.

Like, it’s not a matter of cleverness, either. You can’t code around it. It’s simply not possible.

[go to top]