Shuyuan
This course contains a curriculum unit to be integrated into the Shuyuan MWC curriculum. This unit's main objective is to provide students a conceptual understanding of how modern LLMs work, grounded in familiarity with their architecture and core algorithms. This is broadly the same approach taken by Stanford's CS336: Language Modeling from Scratch and Andrej Karpathy's online course, Neural Networks: Zero to Hero, though these labs are designed for introductory high school CS students.
The unit's organizing logic is to build successive models aiming to reproduce the generative AI students are familiar with, building up the conceptual framework needed to understand Transformer-based LLMs, giving a hands-on understanding of the components as far as possible, ultimately aiming for conceptual understandings while minimizing mathematical prerequisites.
The sequence of labs is:
- TinyLM, which builds a simple generative model based on word (or token) sequences.
- Matrices, which introduces matrices and translates TinyLM into a matrix structure shared with more complex machine learning models.
- Embeddings, which introduce learned representations of words, allowing the model to generalize beyond contexts it has seen before, and to support larger context windows without the model size exploding.
- Local models, which introduces the hardware and software requirements for hosting local models.
As with all MWC units, this unit concludes with a project,