OBD Note (not work)
Note of Optimal Brain Damage
interests
- Complexity measures: Vapnik-Chervonenkis dimensionality, a time-honored(albeit inexact) measure of complexity: simply the number of non-zero free parameters.
- some measure of network complexity: in the statistical inference literature and NN literature.
- How can the author make statement that: automatic network minimization procedure ands as as an interactive tool to suggest better architectures.
- One of the main points of this paper is to move beyond the approximation that magnitude equanls saliency
- I don’t think it works now.
Key findings:
- use weight decay, non-proportionate for sparsity or mixture precision
- It omit the cross term, lacking ability of find redundant pattern, and not simpler than magnitude way.
Questions:
- what’s group lasso, which one is better compared with weight decay? why it’s suggested weight decay is not suitable for sparsity? can weight decay therefore be used for mixture precision.
- does optimal brain damage really works? we need to try or research. there are several modern method:
- Magnitude-based pruning (simpler, often similarly effective)
- Gradual pruning during training
- Lottery ticket hypothesis approaches
- More sophisticated second-order methods like Fisher Information
- what’s fisher information?
- In the assumption of OBD, it says: “delta E caused by deleting several parameters is the sum of the delta E’s caused by deleting each parameter individually.” Does this assumption really work?