Study Research Note

good reading

llm quantization world -zhihu it include:
- introduction to fix point quantization
- current study focus on per-channel, token, group fix point quantization
- per-group refer to we do quantization on a series of continuous element

AWQ and SqueezeLLM
-zhihu AWQ use linear scale (pretty good) Squeeze use dynamic un-uniform scale quantization (not that good) key observation: there are salient weight, we can perform quantization according to these salient weight
QBS(Qptimal Brain Surgeon), classic, need to read
GPTQ and OBQ

-TODO list obsidian: [[Read TODO]]

what’s group lasso, which one is better compared with weight decay? why it’s suggested weight decay is not suitable for sparsity? can weight decay therefore be used for mixture precision.
does optimal brain damage really works? we need to try or research. there are several modern method:
- Magnitude-based pruning (simpler, often similarly effective)
- Gradual pruning during training
- Lottery ticket hypothesis approaches
- More sophisticated second-order methods like Fisher Information
what’s fisher information?
In the assumption of OBD, it says: “delta E caused by deleting several parameters is the sum of the delta E’s caused by deleting each parameter individually.” Does this assumption really work?

use weight decay, non-proportionate for sparsity or mixture precision
It omit the cross term, lacking ability of find redundant pattern, and not simpler than magnitude way.

-OBD note obsidian: [[OBD Note (not work)]]