OBD Note (not work)

Note of Optimal Brain Damage

interests

Complexity measures: Vapnik-Chervonenkis dimensionality, a time-honored(albeit inexact) measure of complexity: simply the number of non-zero free parameters.
some measure of network complexity: in the statistical inference literature and NN literature.
How can the author make statement that: automatic network minimization procedure ands as as an interactive tool to suggest better architectures.
One of the main points of this paper is to move beyond the approximation that magnitude equanls saliency
I don’t think it works now.

use weight decay, non-proportionate for sparsity or mixture precision
It omit the cross term, lacking ability of find redundant pattern, and not simpler than magnitude way.

what’s group lasso, which one is better compared with weight decay? why it’s suggested weight decay is not suitable for sparsity? can weight decay therefore be used for mixture precision.
does optimal brain damage really works? we need to try or research. there are several modern method:
- Magnitude-based pruning (simpler, often similarly effective)
- Gradual pruning during training
- Lottery ticket hypothesis approaches
- More sophisticated second-order methods like Fisher Information
what’s fisher information?
In the assumption of OBD, it says: “delta E caused by deleting several parameters is the sum of the delta E’s caused by deleting each parameter individually.” Does this assumption really work?