CSE 711: Computational Learning Theory (Seminar)

[PC *] stands for "presentation candidate". I will assign some for presentation, and some of you can pick some others. The [PC *] that's in the "Topics + Notes" column are probably more central to the presentation flow.
This tentative schedule will be continuously updated throughout the semester. (Mostly to shift the schedule down.)

Week	Topics + Notes	Additional Readings, Notable Events
1. Aug 30	What is COLT? Characterizing learning models. Consistency Model. PAC Model. [cse694] Lecture 14's slides [Schapire's course] Scribe notes 1, scribe notes 2 [Blum's course] Lecture 1 slides [Kearns-Vazirani] Chapter 1 [PC -- Branislav] Consistency model Monotone disjunction is CM-learnable k-CNF is CM-learnable Separation Hyperplane is CM-learnable	Pitt, L. and Valiant, L. G. 1988. Computational limitations on learning from examples. J. ACM 35, 4 (Oct. 1988), 965-984. [PC] Aldous, D. and Vazirani, U. 1995. A Markovian extension of Valiant's learning model. Inf. Comput. 117, 2 (Mar. 1995), 181-186. [ pdf ] Blum, A. and Rivest, R. L. 1989. Training a 3-node neural network in NP-complete. In Advances in Neural information Processing Systems 1 Morgan Kaufmann Publishers, San Francisco, CA, 494-501. [PC] Feldman, V. 2009. Hardness of approximate two-level logic minimization and PAC learning with membership queries. J. Comput. Syst. Sci. 75, 1 (Jan. 2009), 13-26. (Also STOC'06) [ pdf ] Vitaly Feldman, Hardness of Proper Learning, The Encyclopedia of Algorithms, 2008 Vitaly Feldman, Statistical Query Learning, The Encycopedia of Algorithms, 2008.
2. Sep 06	Sample complexity. Sample complexity for finite hypothesis spaces. VC-dimension. Sample complexity for infinite hypothesis spaces. [cse694] Lecture 15's slides [Schapire's course] Scribe notes 3, notes 4, notes 5, notes 6 [Blum's course] Lecture 0128, lecture 0202 [Kearns-Vazirani] Chapter 1 & 2 [Rivest's course] Lecture 8, lecture 9 [PC -- Swapnoneel] Some hardness results k-term DNF is not CM-learnable (i.e. it's NP-hard), for any k ≥ 2 Intractability of learning 3-term DNF by 3-term DNF (See Rivest's lecture 5)	Mon, Sep 06 is Labor Day. Thursday, Sep 09 is Rosh Hashanah. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, Manfred K. Warmuth: Occam's Razor. Inf. Process. Lett. 24(6): 377-380 (1987). [This is the original Occam's Razor paper] [PC] Ming Li, John Tromp, Paul M. B. Vitányi: Sharpening Occam's razor. Inf. Process. Lett. 85(5): 267-274 (2003). Hosking, J. R., Pednault, E. P., and Sudan, M. 1997. A statistical perspective on data mining. Future Gener. Comput. Syst. 13, 2-3 (Nov. 1997), 117-134. [PC] Misha Alekhnovich, Mark Braverman, Vitaly Feldman, Adam Klivans, Toniann Pitassi, The complexity of properly learning simple concept classes, Journal of Computer and System Sciences, 74(1), 2008 (also, FOCS 2004). Two people can present this
3. Sep 13	[PC -- Steven] Some PAC-learning results. Learning k-decision list (see Rivest's lecture 6) Learning 3-term DNF by 3-CNF (Rivest's lecture 5)
4. Sep 20	[PC -- Steve Uurtamo] Three different proofs of Sauer's lemma. See also a blog post by Tim Gowers (that's the first proof). Induction is the second proof. And proof using shifting technique is the third. All proofs are short, and you'll learn nice combinatorial techniques from them.
5. Sep 27	Dealing with Noises. Inconsistent Hypothesis Model. Empirical error and Generalization error. Uniform convergence theorem. [cse694] Lecture 16's slides [Schapire's course] Scribe notes 7, notes 8 [Kearns-Vazirani] Chapter 5 [PC -- Daniel Megalo (tue sep 28)] Sample complexity lowerbound. Show that Omega(d/ε) is necessary, where d is the VC-dimension. (Rivest's lecture 10)	[PC] Venkatesan Guruswami, Prasad Raghavendra: Hardness of Learning Halfspaces with Noise. SIAM J. Comput. 39(2): 742-765 (2009). (Aslo FOCS 2006). [ pdf ]
6. Oct 04	Weak and Strong PAC-learning. Boosting & AdaBoost, training error bound. [Schapire's course] Scribe notes 9 [Blum's course] Lecture 0209 Robert E. Schapire. The boosting approach to machine learning: An overview. In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003. [ pdf ] [PC -- Xiaoxing Yu] State and prove Theorem 8 (page 17) in the following paper: Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997. [ Postscript ] (the original AdaBoost paper). The theorem is very short, and it makes use of Theorem 1 in the following paper. Thus, please prove both theorems Baum, E. B. and Haussler, D. 1989. What size net gives valid generalization?. Neural Comput. 1, 1 (Mar. 1989), 151-160. [ pdf ]	Ron Meir and Gunnar Rätsch. An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning (LNAI2600), 2003 [ Pdf ]
7. Oct 11	Generalization error bounds: naive and margins-based [Schapire's course] Scribe notes 10, notes 11 [Blum's course] Lecture 0211 Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998. [ pdf ] [PC Caiming] Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." (Feb. 1999a). Caiming can take an entire lecture (1.5 hour) for this	[PC] Robert E. Schapire. The convergence rate of AdaBoost [open problem]. In The 23rd Conference on Learning Theory, 2010. [ pdf ] [PC] Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting out- put codes. Journal of Artificial Intelligence Research, 2:263–286, January 1995. (They showed how to convert binary classifiers into multiclass-classifers using error correcting codes!) Two people can present this [PC] Robert E. Schapire. Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the Fourteenth International Conference, 1997. [ Postscript ]
8. Oct 18	[PC Praneeta] Empirical margin loss bound. Prove Theorem 1, page 129, or this paper. [PC Yongding] Massart's Lemma and its corollary + Rademacher complexity of H is equal to the Rademacher complexity of co(H). Please prove 3 things Massart's Lemma & its corrollary (page 15, 16, 17 in Lecture 3 of Mehryar Mohri's class) Rademacher complexity of Convex Hull (page 23, Lecture 6 of Mehryar Mohri's class) Both presentations are on Tuesday, Oct 19.
9. Oct 25
10. Nov 01	Support Vector Machines, the linearly separable case [Schapire's course] Scribe notes 11, notes 12 [Blum's course] Lecture 0216. An extremely brief introduction to optimization (Hung Ngo's lecture notes)	Chris Burges' SVM tutorial. [ pdf ] Excerpt from Vapnik's The nature of statistical learning theory.
11. Nov 08	SVM: the kernel trick [Blum's course] Lecture 0218 [Schapire's course] Scribe notes 13	O. Bousquet, S. Boucheron, and G. Lugosi, Introduction to Statistical Learning Theory. [ pdf ]
12. Nov 15	Online learning. The mistake-bound model. Learning from expert advices. WMA & RWMA. [cse694] Lecture 17 slides [Schapire's course] Scribe notes 14 [Blum's course] Lecture 0114 [PC] Perceptron algorithm & its analysis	Avrim Blum. "On-line algorithms in machine learning." In Dagstuhl Workshop on On-Line Algorithms, June, 1996. [ ps ] Shai Shalev-Shwartz and Yoram Singer, Tutorial on Theory and Applications of Online Learning, ICML 2008. A Mind Reader Game (You should definitely try to play this game!) Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., and Warmuth, M. K. 1997. How to use expert advice. J. ACM 44, 3 (May. 1997), 427-485. Frans M. J. Willems, Yuri M. Shtarkov, Tjalling J. Tjalkens: The context-tree weighting method: basic properties. IEEE Transactions on Information Theory 41(3): 653-664 (1995). [1996 Paper Award of the IEEE Information Theory Society] Littlestone, N. 1988. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm. Mach. Learn. 2, 4 (Apr. 1988), 285-318. [The Winnow paper, pdf ]
13. Nov 22	Winnow [cse 694] Lecture 17 slides [Schapire's course] Scribe notes 15 [Blum's course] Lecture 0121	Wed Nov 24 -- Fri Nov 26: Fall Recess Blum, A. and Y. Mansour (2007) Learning,Regret Minimization, and Equilibria. In Algorithmic Game Theory (eds. N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani),Cambridge University Press. [ pdf ] Yoav Freund and Robert E. Schapire, Adaptive Game Playing Using Multiplicative Weights, Games and Economics Behaviors, 29: 79-103, 1999. [ ps ]
14. Nov 29	Linear regression	Jyrki Kivinen and Manfred K. Warmuth. Exponentiated Gradient versus Gradient Descent for Linear Predictors. Information and Computation, 132(1):1-63, January, 1997. pdf.
15. Dec 06	Maximum entropy, maximum likelihood	Fri Dec 10 is the last day of classes.

CSE 711: Computational Learning Theory (Fall 2010 Seminar)