Week 
Topics
+ Notes 
Additional Readings,
Notable Events 
1. Aug
30 
What is COLT?
Characterizing learning models. Consistency Model. PAC Model.
[PC  Branislav]
Consistency model
 Monotone disjunction is CMlearnable
 kCNF is CMlearnable
 Separation Hyperplane is CMlearnable

 Pitt, L. and
Valiant, L. G. 1988. Computational
limitations on learning from
examples. J. ACM 35, 4 (Oct.
1988), 965984.
 [PC] Aldous, D. and Vazirani,
U. 1995. A Markovian extension of Valiant's learning model. Inf.
Comput. 117, 2 (Mar. 1995), 181186. [ pdf
]
 Blum, A. and Rivest, R. L. 1989. Training
a
3node neural network in NPcomplete. In Advances in Neural
information Processing Systems 1 Morgan Kaufmann Publishers, San
Francisco, CA, 494501.
 [PC]
Feldman, V. 2009.
Hardness of approximate twolevel logic minimization and PAC learning
with membership queries. J. Comput. Syst. Sci. 75, 1 (Jan. 2009),
1326. (Also STOC'06) [ pdf
]
 Vitaly Feldman, Hardness
of Proper Learning, The Encyclopedia of Algorithms, 2008
 Vitaly Feldman, Statistical
Query Learning, The Encycopedia of Algorithms, 2008.

2. Sep
06 
Sample
complexity. Sample complexity for finite hypothesis spaces.
VCdimension. Sample complexity for infinite hypothesis spaces.
[PC  Swapnoneel] Some hardness results
 kterm DNF is not
CMlearnable (i.e. it's NPhard), for any k ≥ 2
 Intractability of learning 3term DNF by 3term DNF (See
Rivest's lecture
5)

Mon, Sep 06 is
Labor Day.
Thursday, Sep 09 is Rosh Hashanah.
 Anselm Blumer, Andrzej
Ehrenfeucht, David Haussler, Manfred K. Warmuth: Occam's
Razor. Inf.
Process. Lett. 24(6): 377380 (1987). [This is the
original Occam's
Razor paper]
 [PC]
Ming Li, John Tromp, Paul M. B. Vitányi: Sharpening
Occam's
razor. Inf. Process. Lett.
85(5): 267274 (2003).
 Hosking, J. R.,
Pednault, E. P., and Sudan, M. 1997. A
statistical perspective on data mining.
Future Gener. Comput. Syst. 13, 23 (Nov. 1997), 117134.
 [PC]
Misha Alekhnovich, Mark Braverman, Vitaly Feldman, Adam Klivans,
Toniann Pitassi, The complexity of properly learning simple
concept
classes, Journal of Computer and System Sciences, 74(1), 2008 (also,
FOCS 2004). Two people
can present
this

3. Sep
13 
[PC  Steven]
Some PAClearning results.
 Learning
kdecision list (see Rivest's lecture
6)
 Learning 3term DNF by 3CNF (Rivest's lecture 5)


4. Sep
20 
[PC  Steve Uurtamo]
Three different proofs
of
Sauer's lemma. See also a blog
post by Tim Gowers (that's the
first proof). Induction is the second proof. And proof using shifting
technique is the third. All proofs are short, and you'll learn nice
combinatorial techniques from them. 

5. Sep
27 
Dealing with Noises.
Inconsistent Hypothesis Model. Empirical error and Generalization error. Uniform
convergence theorem.
[PC  Daniel Megalo (tue sep 28)]
Sample complexity
lowerbound. Show that Omega(d/ε) is necessary, where d is the
VCdimension. (Rivest's lecture 10)

[PC]
Venkatesan Guruswami,
Prasad Raghavendra: Hardness of Learning
Halfspaces with Noise. SIAM J. Comput. 39(2): 742765 (2009). (Aslo
FOCS 2006). [ pdf
] 
6. Oct
04 
Weak
and Strong PAClearning. Boosting & AdaBoost, training error
bound.
 [Schapire's course] Scribe
notes 9
 [Blum's course] Lecture
0209
 Robert E. Schapire. The
boosting approach to
machine learning: An overview. In
D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors,
Nonlinear Estimation and Classification. Springer, 2003. [ pdf
]
[PC  Xiaoxing Yu] State and prove Theorem 8 (page 17) in the following paper:
Yoav Freund and Robert E. Schapire. A
decisiontheoretic generalization
of online learning and an application to boosting. Journal of Computer
and System Sciences, 55(1):119139, 1997. [ Postscript
] (the original AdaBoost paper).
The theorem is very short, and it makes use of Theorem 1 in the following paper. Thus, please prove both theorems
Baum, E. B. and Haussler, D. 1989. What size net gives valid generalization?. Neural Comput. 1, 1 (Mar. 1989), 151160. [ pdf ]

 Ron Meir and Gunnar Rätsch.
An introduction to boosting and leveraging.
In Advanced Lectures on Machine Learning (LNAI2600), 2003
[ Pdf
]

7.
Oct 11 
Generalization
error bounds: naive and marginsbased
 [Schapire's course] Scribe
notes 10, notes
11
 [Blum's course] Lecture
0211
 Robert E. Schapire, Yoav
Freund, Peter Bartlett and Wee Sun Lee.
Boosting the margin: A new explanation for the effectiveness of voting
methods. The Annals of Statistics, 26(5):16511686, 1998. [ pdf
]
 [PC Caiming]
Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine."
(Feb. 1999a). Caiming can take an entire lecture (1.5 hour) for this

 [PC] Robert
E. Schapire. The convergence rate of AdaBoost [open
problem]. In The 23rd
Conference on Learning Theory, 2010. [ pdf
]
 [PC]
Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning
problems via errorcorrecting out put codes. Journal of Artificial
Intelligence Research, 2:263–286, January 1995. (They showed how to convert binary
classifiers into multiclassclassifers using error correcting codes!) Two people can present
this
 [PC]
Robert E. Schapire. Using output codes to boost multiclass learning
problems. In Machine Learning: Proceedings of the Fourteenth
International Conference, 1997. [ Postscript
]

8. Oct
18 
 [PC Praneeta] Empirical margin loss bound. Prove Theorem 1, page 129, or this paper.
 [PC Yongding]
Massart's Lemma and its corollary + Rademacher complexity of H is equal
to the Rademacher complexity of co(H). Please prove 3 things
 Massart's Lemma & its corrollary (page 15, 16, 17 in Lecture 3 of Mehryar Mohri's class)
 Rademacher complexity of Convex Hull (page 23, Lecture 6 of Mehryar Mohri's class)
Both presentations are on Tuesday, Oct 19.


9.
Oct 25 


10.
Nov 01 
Support
Vector Machines, the linearly separable case

Chris Burges' SVM
tutorial. [ pdf
]
Excerpt
from Vapnik's The nature of statistical learning theory.

11. Nov 08 
SVM: the kernel trick

O. Bousquet, S.
Boucheron, and G. Lugosi, Introduction to Statistical Learning Theory.
[ pdf
] 
12. Nov
15 
Online
learning. The mistakebound model. Learning from expert advices. WMA
& RWMA.
[PC]
Perceptron algorithm & its analysis 
 Avrim Blum. "Online
algorithms in machine learning." In Dagstuhl Workshop on OnLine
Algorithms, June, 1996.
[ ps
]
 Shai ShalevShwartz and Yoram Singer, Tutorial
on Theory and Applications of Online Learning, ICML 2008.
 A
Mind
Reader Game (You should definitely try to play this game!)
 CesaBianchi, N., Freund, Y., Haussler, D., Helmbold,
D.
P., Schapire, R. E., and Warmuth, M. K. 1997. How
to use expert advice. J. ACM 44, 3 (May. 1997), 427485.
 Frans M. J. Willems, Yuri M. Shtarkov, Tjalling J.
Tjalkens: The
contexttree weighting method: basic properties. IEEE
Transactions
on Information Theory 41(3): 653664 (1995). [1996
Paper Award
of the IEEE Information Theory Society]
 Littlestone, N. 1988. Learning Quickly When
Irrelevant
Attributes Abound: A New LinearThreshold Algorithm. Mach. Learn. 2, 4
(Apr. 1988), 285318. [The Winnow paper, pdf
]

13. Nov
22 
Winnow

Wed
Nov 24  Fri Nov 26: Fall Recess
 Blum, A. and Y.
Mansour (2007) Learning,Regret Minimization, and Equilibria. In
Algorithmic Game Theory (eds. N. Nisan, T. Roughgarden, E. Tardos, and
V. Vazirani),Cambridge University Press. [ pdf
]
 Yoav Freund and Robert E. Schapire, Adaptive Game
Playing
Using Multiplicative Weights, Games and Economics Behaviors, 29:
79103, 1999. [ ps
]

14. Nov
29 
Linear
regression 
Jyrki Kivinen and Manfred K. Warmuth. Exponentiated
Gradient
versus Gradient Descent for Linear Predictors. Information and
Computation, 132(1):163, January, 1997. pdf.

15. Dec
06 
Maximum
entropy, maximum likelihood

Fri
Dec 10 is the last day of classes. 