zlacker

[parent] [thread] 5 comments
1. yarg+(OP)[view] [source] 2023-05-16 23:05:50
There's no chance that we've peeked from a bang for buck sense - we still haven't adequately investigated sparse networks.

Relevantish: https://arxiv.org/abs/2301.00774

The fact that we can reach those levels of sparseness with pruning also indicates that we're not doing a very good job of generating the initial network conditions.

Being able to come up with trainable initial settings for sparse networks across different topologies is hard, but given that we've had a degree of success with pre-trained networks, pre-training and pre-pruning might also allow for sparse networks with minimally compromised learning capabilities.

If it's possible to pre-train composable network modules, it might also be feasible to define trainable sparse networks with significantly relaxed topological constraints.

replies(3): >>alexel+r2 >>stephc+a7 >>cma+X7
2. alexel+r2[view] [source] 2023-05-16 23:19:04
>>yarg+(OP)
I don’t think you really disagree with GP? I think the argument is we peaked on “throw GPUs at it”?

We have all kinds of advancements to make training cheaper, models computationally cheaper, smaller, etc.

Once that happens/happened, it benefits OAI to throw up walls via legislation.

replies(2): >>Neverm+D8 >>yarg+Ue1
3. stephc+a7[view] [source] 2023-05-16 23:49:34
>>yarg+(OP)
The efficiency of training has very unlikely reached its peak or near its peak. We are still inefficient. But the bottleneck might be elsewhere, in data, what we use to feed them.

Maybe not peaked yet, but the case can be made that we’re not seeing infinite supply…

4. cma+X7[view] [source] 2023-05-16 23:55:15
>>yarg+(OP)
50% sparsity is almost certainly already being used given that it is accelerated in current nvidia hardware both at training time, usable dynamically through RigL ("Rigging the Lottery: Making All Tickets Winners" https://arxiv.org/pdf/1911.11134.pdf )--which also addresses your point about initial conditions being locked in-- and at accelerates 50% sparsity at inference time.
◧◩
5. Neverm+D8[view] [source] [discussion] 2023-05-17 00:00:02
>>alexel+r2
No way has training hit any kind of cost, computing or training data efficiency peak.

Big tech advances, like the models of the last year or so, don't happen without a long tail of significant improvements based on fine tuning, at a minimum.

The number of advances being announced by disparate groups, even individuals, also indicates improvements are going to continue at a fast pace.

◧◩
6. yarg+Ue1[view] [source] [discussion] 2023-05-17 11:13:14
>>alexel+r2
Yeah, it's a little bit RTFC to be honest.
[go to top]