zlacker

[parent] [thread] 3 comments
1. meowfa+(OP)[view] [source] 2022-05-23 23:16:09
Not elitist at all; I highly appreciate this post. I know the basics of ML but otherwise am clueless when it comes to the true depths of this field and it's interesting to hear this perspective.
replies(1): >>benree+a2
2. benree+a2[view] [source] 2022-05-23 23:35:01
>>meowfa+(OP)
I used a lot of jargon and lingo and inside baseball in that post, it was intended for people who have deep background.

But if you’re interested I’m happy to (attempt) answers to anything that was jargon: by virtue of HN my answers will be peer-reviewed in real time, and with only modest luck, a true expert might chime in.

replies(1): >>blindi+aa
◧◩
3. blindi+aa[view] [source] [discussion] 2022-05-24 00:41:25
>>benree+a2
Is there a handy list of generally recognized AI advancements, and their owners, that you would recommend reviewing? Or perhaps, seminal papers published? I'm only tangentially familiar with the field but would be curious to learn about the clash of the Titans playing out. Thanks!
replies(1): >>benree+ge
◧◩◪
4. benree+ge[view] [source] [discussion] 2022-05-24 01:19:09
>>blindi+aa
That’s too big a question to even attempt an answer in an HN comment, but to try to answer a realistic subset of it: “Attention is All You Need” in like 2017 is the paper most germane to my remark, and probably the thread. The modeling style it introduced often gets called a “transformer”.

The TLDR is that people had been trying for ages to capture long-distance (in the input or output, not the black box) relationships in a way that was amenable to traditional neural-network training techniques, which is non-obvious how to do because your basic NN takes an input without a distance metric, or put more plainly: it can know all the words in a sentence but struggles with what order they are in without some help.

The state of the art for awhile was something called an LSTM, and those gadgets are still useful sometimes, but have mostly been obsoleted by this attention/transformer business.

That paper had a number of cool things in it but two stand out:

- by blinding an NN to some parts of the input (“masking”) you can incentivize/compel it to look at (“attend to”) others. That’s a gross oversimplification, but it gets the gist of it I think. People have come up with very clever ways to boost up this or that part of the input in a context-dependent way.

- by playing with some trigonometry you can get a unique shape that came be expressed as a sun on something else that gives the model its “bearings” so to speak as to “where” it is in the input. such a word is closer to the beginning of a paragraph sort of a thing. people have also gotten very clever about how to do this, but the idea is the same: how do I tell a neural network that there’s structure in what would otherwise be a pile of numbers.

[go to top]