Road to GPT — Ainiketan

1

02

The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain
Frank Rosenblatt 1958 Beginner 50 minutes

Turing asked if machines could think. Rosenblatt built one that could learn. The Perceptron is the grandfather of every neural network alive today — the first machine that adjusted itself based on experience, rather than following rules someone wrote by hand.

Read
2

03

Learning Representations by Back-propagating Errors
David Rumelhart, Geoffrey Hinton, Ronald Williams 1986 Intermediate 55 minutes

The Perceptron could learn, but only simple patterns. Multi-layer networks could learn complex patterns, but nobody knew how to train them. This paper answered that question — with a single elegant algorithm that is still the beating heart of every neural network trained today.

Read
3

04

Long Short-Term Memory
Sepp Hochreiter, Jürgen Schmidhuber 1997 Intermediate

Read
4

05

Efficient Estimation of Word Representations in Vector Space (Word2Vec)
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean 2013 Intermediate

Read
5

06

Sequence to Sequence Learning with Neural Networks
Ilya Sutskever, Oriol Vinyals, Quoc V. Le 2014 Intermediate

Read
6

07

Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio 2014 Intermediate

Read
7

08

Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Łukasz Kaiser, Illia Polosukhin 2017 Intermediate

Read
8

10

Improving Language Understanding by Generative Pre-Training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever 2018 Intermediate

Read
9

11

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova 2018 Intermediate

Read
10

12

Language Models are Few-Shot Learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei 2020 Intermediate

Read
11

13

Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Aditya Ramesh, Prafulla Dhariwal, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei 2020 Intermediate

Read
12

15

Training Language Models to Follow Instructions with Human Feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelley, Emma Coleman, Brennan Zoph, Amanda Askell, Solal Picciotto, Ariel Herbert-Voss, Jeff Engstrom, Christopher Olah, Gretchen Krueger, Ryan Felsher, Timothy Telleen-Lawton, Tom Conerly, Tamera Lanham, Karina Nguyen, Todd Henighan, Saurav Kadavath, Nick Joseph, Tom Brown, Jack Clark, Dawn Song, Dario Amodei, Ilya Sutskever, Paul Christiano, Sam Altman 2022 Intermediate

Read

Want to go deeper? Browse all 24 papers or explore the math behind them.

All 24 Papers ∑ Math Playground