zlacker

It is now several decades later.

Do TensorFlow/CNN builders use random initial configurations, or custom designed stuctures?

replies(6): >>argona+C >>Housha+N6 >>discar+98 >>ehudla+fh >>raverb+1l >>nabla9+tw

>>gohrt+(OP)
Usually random drawn from a certain distribution (ex: Gaussian with std. deviation 0.001, or a std. deviation dependent on the number of input/output units (Xavier initialization)).

For some tasks, you may wish to initialize using a network that was already trained on a different dataset, if you have reason to believe the new training task is similar to the previous task.

>>gohrt+(OP)
NN weights need to start random because otherwise two weights with exactly the same value can get "stuck" and be unable to differentiate. Backpropagation relies on starting random patterns that kind of match so that it can fine tune them.

But the weights are often initialized to be really close to zero.

replies(1): >>brianp+W9

>>gohrt+(OP)
If you start with the same weights, then the neurons with similar connections will learn the same things. Random initialization is what gets them started in different directions.

>>Housha+N6
Given the era though, Sussman may have actually been working with a neural net that's not the typical hidden-layer variety. "Randomly wired" could be a statement about the topography of the network, not about the weights.

replies(1): >>argona+5m

>>gohrt+(OP)
The key is what learning procedure is used. It is not clear from the story if the nets were learning, and if so how.

>>gohrt+(OP)
Random weights but the spacial organization of inputs follows the input geometry

>>brianp+W9
There is no evidence he was actually working with a neural net.

https://web.archive.org/web/20120717041345/http://sch57.msk....

replies(1): >>dTal+UK1

>>gohrt+(OP)
There is another deeper meaning in this koan.

It's related to the No Free Lunch Theorems. It basically says that if an algorithm performs well on a certain class of learning, searching or optimization problems, then it necessarily pays for that with degraded performance on the set of all remaining problems.

In other words, you always need bias to learn meaningfully. More you have (the right kind of) bias, faster you can learn the subject in hand and slower in all other kinds. In neural networks the bias is not just the weights. There is bias in the selection of random distribution of the network weights (uniform, Gaussian etc.) There is bias in the network topology. There is bias in the learning algorithm, activation function, etc.

Convolutional neural networks are good example. They have very strong bias baked into them and it works really well.

>>argona+5m
I had no idea this existed; it's brilliant!