Reinforcement learning lab

Use the /mwc-module skill.

Create a lab at /Users/chrisp/Repos/MWC/making-with-code/site/content/courses/dp/labs/reinforcement_learning, with starter code in a repo at ~/Repos/MWC/modules/reinforcement_learning.

Learning objectives

The learning objectives for this lab are:

Ensure that the learning objectives are included in the lab's metadata using the provided mechanism. Also add a teaching note at the beignning of the lab explaining how the lab meets the learning objectives. This lab is designed to complement direct instruction, not substitute for it, so while definitions are provided for all key terms and concepts, we expect that students will also be receiving reinforcement of these concepts before, during, and/or after the lab. Integration of the lab into the course is up to the teacher.

Some of the objectives are core to the lab experience; for others, the lab provides a rich authentic opportunity to explore the concepts. In the latter case, add teaching notes throughout the lab explaining how and where they come up. Remember, our goal is to keep the text of the lab (aside from teaching notes) streamlined, focused on what students will do--students will not read huge amounts of text.

Lab structure

Training forager

In the repo, implement a game called "babysnake" using retro-games, a small grid where an agent collects a food item that respawns after collection.

First, have students play the game and write down the reasoning they are using.

Introduce Q-learning with the babysnake game; students will manually calculate some Q updates, and be guided to implement Q-learning in Python, with some starter code.

This section ends with a checkpoint in which students have to train babysnake to perform well.

Also write soluttions for the student exercise. I will run the solution to ensure that it produces a well-behaved babysnake, and I'll remove it from the repo before publication.

Training snake

This section introduces training of a more complex game, snake. Students will not train snake on their own. Instead, we will provide artifacts of attempts to train the snake in this (Claude Code session) conversation's history, and then students will answer conceptual questions, interpreting the evidence we encountered, reasoning about the behavior of epsilon, learning rate, loss, etc., in other situations. Organize this section into a list of subsections; start each subsection with our hypothesis for what might work, explain what we did, and then show evidence of how it went.

First, copy runs/snake-ego into the student repo, saving just a few interesting checkpoints (e.g. around episode 1800 when the initial reward spiked, midway through increasing performance, and the final policy. Add a checkpoint asking students to describe the policy's behavior in each.

I saved full data from one previous run (in ~/Desktop/snaketrainer); use this as a mid-point case study. For others, draw evidence from earlier in this conversation and summarize. Walk students through interpreting the evidence, introducing concepts and terms as needed. Present this without referring to changes/refactors we made to the framework--present the progress as if the retro-gamer framework and its contract with retro-games were stable in its final form the whole time. The point is not system design, but RL concepts. It's fine to compress multiple iterations into a single synthesized iteration to make the story cleaner.

This list of subsections should definitely include:

Note: For this section (Training snake), use runs/snake-ego at its current state--it is still training. We will return to the lesson and update the lesson and the repo once training has completed.

End this section with a checkpoint asking students to complete a list of conceptual questions, written out in the checkpoint, and in snake_training.md in the repo. Ensure that the conceptual questions here and in the next section are aligned with the lab's learning objectives.

Training x

In this section, students will train their own game. Create a small, easily trainable game in the repo, as well as training_log.md, where students should document their efforts in the same manner as with the previous section on training snake. The game to be trained should be simple enough that students will have success training an intelligent agent, but sufficiently complex that different training regimes will produce agents with different levels of success. A class might want to have a competition to train the most successful agent.

End this section with a checkpoint asking students to complete the training log, analyze their own success with training, analyzing the behavior of their final policy, and answer a few conceptual questions.

Extension: Train an agent for your own game

Invite students to train an agent for their own games.

Process notes

End the lab with a teaching note suggesting discussion prompts for connecting RL in this lab to real-life situations, both in CS and more broadly (e.g. how does human behavior reflect RL?), and reflecting on the lab at present, what's currently strong and suggestions for improvement.