Deep Networks# Readings: Schuster and Paliwal (1997) Hochreiter and Schmidhuber (1997) Radford et al. (2019) Tutorial video: Let’s build GPT: from scratch, in code, spelled out Recurrent networks# LSTM networks# Transformers#