OBD Note (not work)

Note of Optimal Brain Damage

interests

  • Complexity measures: Vapnik-Chervonenkis dimensionality, a time-honored(albeit inexact) measure of complexity: simply the number of non-zero free parameters.
  • some measure of network complexity: in the statistical inference literature and NN literature.
  • How can the author make statement that: automatic network minimization procedure ands as as an interactive tool to suggest better architectures.
  • One of the main points of this paper is to move beyond the approximation that magnitude equanls saliency
  • I don’t think it works now.

Key findings:

  • use weight decay, non-proportionate for sparsity or mixture precision
  • It omit the cross term, lacking ability of find redundant pattern, and not simpler than magnitude way.

Questions:

  • what’s group lasso, which one is better compared with weight decay? why it’s suggested weight decay is not suitable for sparsity? can weight decay therefore be used for mixture precision.
  • does optimal brain damage really works? we need to try or research. there are several modern method:
    • Magnitude-based pruning (simpler, often similarly effective)
    • Gradual pruning during training
    • Lottery ticket hypothesis approaches
    • More sophisticated second-order methods like Fisher Information
  • what’s fisher information?
  • In the assumption of OBD, it says: “delta E caused by deleting several parameters is the sum of the delta E’s caused by deleting each parameter individually.” Does this assumption really work?

claude talk

-claude public -OBD does not work today.