Read TODO

📚 Priority Reading Queue

1. Weight Noise Injection-Based MLPs With Group Lasso Penalty: Asymptotic Convergence and Application to Node Pruning

  • Authors: Wang J, Chang Q, Chang Q, Liu Y, Pal NR
  • Journal: IEEE Transactions on Cybernetics, 2019
  • Volume/Issue: Vol. 49, No. 12, pp. 4346-4364
  • DOI: 10.1109/TCYB.2018.2864142
  • Key Focus:
    • Shows L2 weight decay inadequacy for sparse solutions
    • Proposes group lasso as regularizer alternative
    • Node pruning applications for fault-tolerant MLPs
  • Status: ⏳ To Read
  • Notes: Key paper showing why traditional weight decay fails for sparsity

2. A Novel Pruning Algorithm for Smoothing Feedforward Neural Networks Based on Group Lasso Method

  • Authors: Wang J, Xu C, Yang X, Zurada JM
  • Journal: IEEE Transactions on Neural Networks and Learning Systems
  • Year: 2018
  • Volume/Issue: Vol. 29, No. 5, pp. 2012-2024
  • DOI: 10.1109/TNNLS.2017.2748585
  • Key Focus:
    • Four new backpropagation variants using Group Lasso
    • Smoothing functions to handle non-differentiability
    • Direct comparison with Weight Decay, Weight Elimination
  • Status: ⏳ To Read
  • Notes: Comprehensive comparison with traditional weight decay methods

3. Group Sparse Regularization for Deep Neural Networks

  • Authors: Scardapane S, Comminiello D, Hussain A, Uncini A
  • Conference/Journal: ArXiv preprint
  • Year: 2016
  • ArXiv ID: 1607.00485
  • Key Focus:
    • Joint optimization of weights, neuron count, and feature selection
    • Group Lasso penalty for network connections
    • Extensive comparison with classical weight decay
  • Status: ⏳ To Read
  • Notes: Foundational paper on group sparse regularization vs weight decay

📝 Reading Notes Template

For each paper, capture:

  • Main Contribution: How does it extend/replace weight decay?
  • Methodology: What specific regularization technique is proposed?
  • Experimental Setup: What baselines are compared?
  • Key Results: Performance vs traditional weight decay
  • Theoretical Insights: Why does the proposed method work better?
  • Implementation Details: Any code or algorithmic specifics
  • Future Directions: What questions does this raise?

🔍 Key Questions to Address

  1. Fundamental Question: Why does traditional L2 weight decay fail for structured sparsity?
  2. Methodological: How do group-based penalties differ from element-wise penalties?
  3. Practical: What are the computational trade-offs between methods?
  4. Theoretical: What convergence guarantees exist for these approaches?

✅ Completion Checklist

  • Paper 1: Weight Noise Injection-Based MLPs
  • Paper 2: Novel Pruning Algorithm for Smoothing
  • Paper 3: Group Sparse Regularization for DNNs
  • Synthesis: Write summary comparing all three approaches
  • Implementation: Try reproducing key results from one paper

Last Updated: Add date when you start reading
Priority: High - Core understanding of weight decay limitations in structured sparsity