Stanford Introduces Rigorous CS336 Course on Building Language Models from Scratch
Stanford University has launched the official website for CS336, an implementation-heavy course requiring students to construct language models from the ground up, with compute resources sponsored by Modal.
Stanford University has published the official website for CS336: Language Modeling from Scratch, a new five-unit course scheduled for the Spring 2026 semester. The curriculum is designed to provide students with a comprehensive understanding of language models by guiding them through the entire development process, including data collection and cleaning, transformer model construction, training, and evaluation prior to deployment.
The course draws inspiration from operating systems classes that require students to build an entire operating system from the ground up. As artificial intelligence, machine learning, and natural language processing continue to expand, Stanford positions deep technical knowledge of language models as essential for scientists and engineers. The course aims to address this need by having students develop their own models rather than relying on pre-existing frameworks.
CS336 is notably implementation-heavy, with assignments designed to be significantly more code-intensive than typical artificial intelligence classes. Students are expected to write at least an order of magnitude more code than in other courses, with minimal scaffolding provided by the staff. Proficiency in Python and software engineering is therefore paramount, as the majority of class assignments will be conducted in this language.
The curriculum demands strong familiarity with PyTorch and basic systems concepts, such as the memory hierarchy. Students must also be comfortable with matrix and vector notation, probability basics including Gaussian distributions, and fundamental principles of machine learning and deep learning. A significant portion of the coursework involves optimising neural language models to run efficiently on graphics processing units across multiple machines.
To manage costs and improve workflow, the course recommends that students debug implementation correctness on a central processing unit before utilising graphics processing units for training runs or benchmarking. Compute resources for the class are supported by a sponsorship from Modal, although students following along independently can access cloud providers for their own assignments.


