CSE712 Seminar: Human Decision Making (At Chess) and Machine Learning

Spring 2018

Dr. Kenneth W. Regan

Topics

Artificial intelligence, machine learning, predictive analytic modeling, chess, decision making, bounded rational choice, general issues in scientific modeling and statistical fitting, cheating detection. Piazza Page

Offerings in the past two years emphasized issues with fitting highly nonlinear models, reducing noise, framing hypotheses for investigation, and using the Bootstrap technique for significance testing and verification of theoretical error bars. During last year's offering a further main issue with the intended extension to the chess model emerged, which led to the article https://rjlipton.wordpress.com/2017/05/23/stopped-watches-and-data-analytics/ in full technical g(l)ory. This issue was further quantified by me over the summer but is still not resolved. What emerges is the appropriateness of using maximum likelihood estimation (MLE) versus other fitting methods. Hence this term's offering will lead off with MLE, drawing on how it is covered in courses such as CSE574 but in new contexts. As always, the seminar can branch from there according to wishes and interests of students for presentations and more. This can extend to deep learning and the recent exploits of AlphaGo Zero.

This year I am emphasizing the correspondence to the theory of personnel evaluation exams, which has various strands called Item Response Theory (IRT), Rasch Modeling, Psychometrics, and Classical Test Theory. The simplest models are called dichotomous because they model exams whose questions have only one non-zero answer. That holds for the vast majority of exams employed, but I believe that substantial information and depth are lost. So-called polytomous models allow for partial credits. Here is a nice and brief page explaining the difference, including a diagram of the expectation curves for various qualities of response. I am especially interested in the latter because:

The biggest selling point is that issues about scoring standards (which are often "re-based") and question design and possible biases and what kind of aptitude is being tested for are completely resolved in the chess world. The chess Elo Rating is a robust and universally accepted measure of chess aptitude. (Systems that try to improve Elo are possible project topics, but their technical differences do not detract from this basic point.) The "questions" arise naturally and are 50% in the player's control---only the opponent has the other 50%. There is copious data from real competitions by players of all aptitudes and all of it is completely public---no IRB waivers or privacy protocols needed. (Except, that is, for my cheating tests...) The question is how this robustness might be translated back into the world of testing.

I have an ulterior motive right now. My chess model was designed from "first principles" in a way contrary to the new "Data First" ethos. Those principles worked well for my original two aptitude parameters of "Sensitivity" (s) and "Consistency" (c), but attempts to add new parameters reflecting depth of thinking (d) or headstrong play (h) have run into problems of nonlinear dynamics. An even worse issue emerged in this seminar last spring and was expounded at length in my article on the Gödel's Lost Letter weblog titled "Stopped Watches and Data Analytics". So I am currently trying to "loosen up" the model---and one way to do this is to make it more like IRT. I will describe all this in the first month in an initial lecture-style format.

As with last year, requirements will be (1) participation in discussions and little experiments, in class and/or on Piazza, (2) learning and applying computational statistical and charting tools, and (3) presenting a "mini-project" or paper (likely teamed). For (2) we have in the past drawn from some "Warmup Ideas", to which more may be added. No background in chess is assumed (enough will be covered early on) and also there are no other prerequisites.

General Description

I have designed a predictive-analytic model that projects the probabilities that humans will choose various decision options, given hindsight values of their worth. Plugging in values given to chess moves by strong computer programs makes the model work for chess, but that is the only chess-specific content. What else besides chess can be done with a model whose general task is "Converting Utilities Into Probabilities"? Why has it been so effective as to be deemed useful in court testimony in chess-cheating cases and covered by the New York Times, NPR, and the Wall Street Journal? Chess may be complex, but on mass scale, human players still follow simple mathematical laws---perhaps we will discover some more.

The seminar aims to relate this research to other Machine Learning applications that have been researched in the Department, and to explore issues in their common methodology. This includes comparing the many different statistical fitting methods (Bayesian, max-likelihood, simple frequentist, and more) that can be used and judged within this same model. The basics of chess and chess programs will be covered in the initial series of lectures by me.

Here are a two-page description and a longer overview of the research, the latter with some mathematical details. My homepage links my public anti-cheating site, papers, talks, New York Times article, and other pages; students in the seminar will be given access to my private sites where testing is done. The last section of the overview includes some possible seminar topics and projects within this research, but students will be equally welcome to give presentations relating it to machine-learning related topics they have had in other courses.

Students are expected to participate in discussions and give at least two hours of presentations. Grading is S/U, 1--3 credits.