Succeeding in CSE 486/586

Distributed Systems is a course that brings together several disciplines, and as such it represents a large learning curve for many students. The prerequisite chain for undergraduate students is short (requiring only CSE 220 (Systems Programming) and CSE 250 (Data Structures)) and may be somewhat misleading. There are no prerequisites for graduate students, which is even more misleading! The most important unstated and un-codifiable prerequisite for this course is a certain maturity as a programmer or Computer Scientist/Engineer. I highly recommend that students taking Distributed Systems have already completed one or two other 3xx/4xx/5xx level courses that involve at least some significant programming project.

Background Material

Students will require both knowledge in conceptual material and experience in programming to succeed in Distributed Systems. This brief summary of that background is not intended to be either prescriptive or complete, but rather to give students who which to prepare ahead of time for the course some digestible material with which to begin.

Conceptual Material

The two areas of knowledge of most immediate value to students of distributed systems are concurrent programming and network communication. I therefore highly recommend that students have some experience with either operating systems or networking before taking this course. It is not necessary to complete a course in OS or networking before taking Distributed Systems, but it is valuable to build some experience in the area.

Students who have not had a formal operating systems course may wish to review Operating Systems in Three Easy Pieces (OSTEP) by Remzi and Andrea Apraci-Dusseau at the University of Wisconsin-Madison. It is a freely available, routinely updated, high quality textbook on operating systems principles and design. Students of distributed systems are likely to find Part I Chapter 7 (CPU Scheduling) and Part II (Concurrency) very valuable, although note that CSE 486/586 uses neither the C programming language nor the POSIX APIs.

Students who have not had a formal networking course will find Internetworking with TCP/IP Volume 1: Principles, Protocols, and Architecture very valuable. It is not freely available, but any recent edition will have the important information that you need, and it is an excellent text that is accessible to a broad audience. I believe that Chapters 1–4 and 11 (in the Sixth Edition, check your edition’s corresponding material! For example, Fifth Edition would be 1-3, 10, and 12), covering the overall architecture of the Internet and the workings of TCP, are most valuable to a student of distributed systems.

The Computer Science Book by Tom Johnson has chapters on networking, operating systems, concurrent programming, and distributed systems, and is freely available online (or can be purchased in ePub, PDF, or Kindle formats). It has apropos and understandable explanations of many of the concepts that would be helpful for a student of distributed systems. I specifically recommend the above-named chapters.

In addition to the specific topics just discussed, students will need a thorough understanding of discrete math at an undergraduate level, a grounding in analysis of algorithms (such as big-O notation), and familiarity with basic data structures (e.g., trees, lists, and hash tables) and their performance characteristics. Algorithm analysis and design are not a primary learning objective in this course, but some analysis and understanding of computational complexity will be necessary to understand the correctness and performance of various protocols and algorithms in the course material.

Programming Experience

Many students, and in particular many transfer and graduate students, taking CSE 486/586 lack the necessary programming background to succeed in the course. This is a rectifiable lack, but it cannot be rectified on the fly during the course without significant difficulty! I recommend that students who are concerned about their programming readiness for this course spend some time learning the Go programming language before hand, and write a network application in Go to become familiar with the underlying programming concerns so that they can concentrate on the distributed systems aspects of the projects rather than on learning to program.

The Go language has an excellent, well-maintained tutorial in A Tour of Go. All students coming in to CSE 486/586 should complete A Tour of Go. It is not difficult and does not take long, and will bring most students with a background in some programming language up to speed with how Go does things. Pay special attention to the Concurrency module, channels, and selection!

After completing A Tour of Go, I recommend that students write a Go network service of some kind (from scratch, not by importing an existing implementation!) to learn how Go handles sockets and network connections. Students who have taken CSE 312 or CSE 489/589 at UB may find it valuable to reimplement one of the projects from one of those courses for practice. For students with no such experience, I recommend implementing a simple HTTP server (you need handle only GET for static resources; whether you hard-code those resources or serve files from the disk is up to you) capable of serving multiple simultaneous clients. Testing that you can effectively serve simultaneous clients probably entails serving very large resources so that transfers may observably occur at the same time!

Projects

The projects in this course are large and may be of a nature unfamiliar to many students, in that they allow for a large degree of implementation freedom. I strongly recommend that students start their projects early and allow for plenty of time for the project requirements and implementation to percolate in their brains. A two week project should require about 20 hours of work, but doing those same twenty hours over two weeks or over three days will result in very different outcomes. In the former case, new insights and strategies will occur to you over time, while in the latter you won’t have ten minutes to step back and re-consider the situation.

Read the handouts very carefully and ensure that you understand both the actual requirements for the project and what is nominally being graded. Lots of implementation effort on something that is not graded, or is worth very few points, is not a great use of time! Likewise, a failure to spend a few minutes to pin down a very specific format and ensure that the appropriate points are awarded for the work that was actually accomplished is regrettable.