These are not in any particular order. Most can be done entirely without
resort to my own Perl scripts; the few that would need or conveniently use
them are marked by a mention of AIFdata.pm or Position.pm, the
latter being more involved.

Graph the frequency of playing capture moves against the rating of the players,
from 1500 to 2750.

Combine that with the frequency with which the engine recommends a
capture move.

Compare the frequency of players playing moves with K,Q,R,B,N, or pawn (pawn moves
do not use a letter P), versus how often moves with each piece are recommended by the
computer egnine.

Do the last item for various rating levels and see if there is any
consistent change.

Compare the frequency of playing moves that are forward, sideways (for
Rook or Queen), and retreating for each rating level, and versus
how often the engine recommends such moves. Hypothesis: good retreating moves
are "harder to see", especially for lowerrated players. (Harder because
determining whether a move is forward or backwards may require using
the Position.pm module to get "Long Algebraic Notation" for the moves,
but this can easily grow into a main project and even a paper when combined with
the methods in the BiswasRegan 2013 paper I gave out.)

Gather a histogram of how many positions in the files have X number of legal moves.

Graph the frequency of having fewer than 10 legal moves against ratingthis is
the one that was illustrated for beginning in the seminar.

Graph the frequency of playing moves that give check versus ratingthis is maybe
subsumed by the previous one.
Hypothesis: lowerrated players give check more often. (Determining a check
could use the Position.pm module, but the game notation in the [GameID] block
has + signs on checking moves that can be counted instead, and the "fewer than 10
legal moves" count may have much the same effect.)

Gather a histogram of how often the secondlisted move is X worse than the firstlisted one, where X falls into intervals 09, 1019, 2029, and so on (or in pawn units, 0.000.09 pawns, then 0.100.19, 0.200.29, and so on) (uses
AIFdata.pm).

Gather a histogram of MM% at each numbered turn going 9,10,11,... up to 60 say. Or maybe better, group the tallies into blocks of 4, that is turns 912, 1316, 1720, ..., 5760.

Gather a histogram of how often there is a change in best move at depth d (uses AIFdata.pm), for d = 1 to 20. (Can also be done without
AIFdata.pm by looking at the "ChangePVs" section of each
move record in the AIF file.)

Check whether the second digits of the 3digit evaluation numbers obey Benford's Law (for which see my blog article
https://rjlipton.wordpress.com/2012/07/29/benfordslawandbaseball/"
since Benford's Law is a major statistical fraudcatching device this could really grow into a presentation.

Does "Zipf's Law" hold in any form? It would say that the Nthbest move is played about 1/(2N) of the time.

Graph the frequency of moving the same piece twice, versus the
number of times the engine recommends it and rating.

Graph the number of times a player matches the engine 10 times in a row,
versus rating.

Same with moves that do not drop in value after the move is played. Note that this can
be done with the Eval and NextEval lines,
not needing AIFdata.pm to read entries from the matrix.

Do lowerrated players play with more disconnected pawns? This needs some way
to convert a FEN code into an 8x8 gridthe Position.pm module has such a
method where you could just grab the code textually and maybe convert it to Python.

Do lowerrated players have games with higher or lower
NodeCountfordepth? As I derived on the board at the start (on Fri. 3/4),
to get a common measure based on depth 20,
take the NodeCount N at the final depth D and compute
C = N^(20/D).
This becomes a measure of the complexity of the positionat least
for the computer to analyze.

Graph the frequency of consecutive blunders against rating, say using
a drop in value of played move of 150 (that is, 1.50 pawns) or more as the
threshold of "blunder". Per Mike Wehar's query,
cases of 3 or more consecutive blunders may be
hints of a mistake in the recorded gamescore.

For something completely different, hunt for and find games in which the average
drop in value of a played move is highsay over 30 centipawns. Well the
MRAIF.pl file already tabulates this for each game in the resulting
.sc3 or .r3 (or etc.) report files, but you can write a much
shorter script to do this just via the Eval and NextEval entries.

Others...suggestions welcome.
Addendum: I was just sent this link by someone in our department:
http://blog.ebemunk.com/avisuallookat2millionchessgames/
It might be interesting to see if any of these stats vary with rating. The material
here depends only on the moves of the game, not on the engine's analysis of how
good and bad certain moves are. And ah! it links to
http://chessdb.com/public/research/game_statistics.html
which overlaps some of the ideas I thought up aboveand hints that we may get some
interesting positive results from them after all.