zlacker

[parent] [thread] 1 comments
1. fho+(OP)[view] [source] 2023-07-02 11:17:26
Writing style is also a lot more individual than people recognize.

Iirc word histograms almost uniquely identify authors. Of course this is on larger amounts of text, but I guess you could identify users over seperate platforms this way.

E.g. Intend to use ellipsis (...) to separate thoughts in online conversation a lot. But I try to not do that in reddit, where I try to stay somewhat anonymous.

Still, I assume that it would be possible to correlate my reddit and HN account just by comparing the word histograms (ie which words I use and how often).

replies(1): >>ineeda+cO1
2. ineeda+cO1[view] [source] 2023-07-03 00:31:22
>>fho+(OP)
Yes, some of my academic work in comp-ling (massively outdated by today’s advances) explored things like the perplexity scores of different Shakespeare plays to explore the controversial claim that some the work attributed to him was actually done by Marlowe.

(As a complete aside, that program of study also included Forensic Linguistics which truly fascinating. And of course the work of Claude Shannon and information theory, though not in any great depth)

[go to top]