CSE 705: Deep Learning (Spring 2015)

TTh, 9:30-10:50pm,
338A Davis (map)

Overview materials on deep learning

Basic Optimization, Variations of Gradient Decent:

Basic complexity results:

  • Sijia Liu Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314. [ pdf ]
  • Kurt Hornik, Maxwell B. Stinchcombe, Halbert White: Multilayer feedforward networks are universal approximators. Neural Networks 2(5): 359-366 (1989) [ pdf ]
  • Someone please present this Andrew R. Barron, Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory 39(3): 930-945 (1993) [ pdf ]
  • Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257
  • Someone please present this Hava T. Siegelmann, Eduardo D. Sontag: On the Computational Power of Neural Nets. J. Comput. Syst. Sci. 50(1): 132-150 (1995) [ pdf ]
  • Qi Oliver Delalleau and Yoshua Bengio, Shallow vs. Deep Sum-Product Networks, NIPS 2011. [ pdf ]

Basic shallow architectures:

  • Xiaowei H. Ackley , E. Hinton , J. Sejnowski, "A learning algorithm for Boltzmann machines", Cognitive Science, 9, 147-169, 1985. [ pdf ]
  • Tutorial on RBM.

Basic deep architectures:

  • Xiaowei Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Deep boltzmann machines." Proceedings of the international conference on artificial intelligence and statistics. Vol. 5. No. 2. Cambridge, MA: MIT Press, 2009. [ pdf ]
  • Zhen Xu Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. "A fast learning algorithm for deep belief nets." Neural Comput. 18, 7 (July 2006), 1527-1554. [ pdf ]

Beyond basic architectures:

  • Qi Sanjeev Arora and Aditya Bhaskara and Rong Ge and Tengyu Ma Provable Bounds for Learning Some Deep Representations. ICML 2014. [ pdf ]
  • Laknath James Martens, Ilya Sutskever: Training Deep and Recurrent Networks with Hessian-Free Optimization. Neural Networks: Tricks of the Trade (2nd ed.) 2012: 479-535. Also ICML 2012. [ pdf ]
  • Ying Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why Does Unsupervised Pre-training Help Deep Learning? JMLR 2010 [ pdf ]
  • Duc Luong Ian J. Goodfellow, Quoc V. Le, Andrew M. Saxe, Honglak Lee and Andrew Y. Ng. Measuring invariances in deep networks. NIPS 2009. [ pdf ]

Auto-encoders:

  • Rohit Guillaume Alain and Yoshua Bengio, "What Regularized Auto-Encoders Learn from the Data Generating Distribution", [ pdf ]

Other lists of papers: