Study Research Note

good reading

  • llm quantization world -zhihu it include:
    • introduction to fix point quantization
    • current study focus on per-channel, token, group fix point quantization
    • per-group refer to we do quantization on a series of continuous element

mixture quantization paper

  • AWQ and SqueezeLLM
    -zhihu AWQ use linear scale (pretty good) Squeeze use dynamic un-uniform scale quantization (not that good) key observation: there are salient weight, we can perform quantization according to these salient weight
  • QBS(Qptimal Brain Surgeon), classic, need to read
  • GPTQ and OBQ

TODO list

-TODO list obsidian: [[Read TODO]]

Questions:

  • what’s group lasso, which one is better compared with weight decay? why it’s suggested weight decay is not suitable for sparsity? can weight decay therefore be used for mixture precision.
  • does optimal brain damage really works? we need to try or research. there are several modern method:
    • Magnitude-based pruning (simpler, often similarly effective)
    • Gradual pruning during training
    • Lottery ticket hypothesis approaches
    • More sophisticated second-order methods like Fisher Information
  • what’s fisher information?
  • In the assumption of OBD, it says: “delta E caused by deleting several parameters is the sum of the delta E’s caused by deleting each parameter individually.” Does this assumption really work?

Key findings (now):

  • use weight decay, non-proportionate for sparsity or mixture precision
  • It omit the cross term, lacking ability of find redundant pattern, and not simpler than magnitude way.

Links

-OBD note obsidian: [[OBD Note (not work)]]