1. 1
    02
    The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain
    Frank Rosenblatt 1958 Beginner 50 minutes

    Turing asked if machines could think. Rosenblatt built one that could learn. The Perceptron is the grandfather of every neural network alive today — the first machine that adjusted itself based on experience, rather than following rules someone wrote by hand.

    Read
  2. 2
    03
    Learning Representations by Back-propagating Errors
    David Rumelhart, Geoffrey Hinton, Ronald Williams 1986 Intermediate 55 minutes

    The Perceptron could learn, but only simple patterns. Multi-layer networks could learn complex patterns, but nobody knew how to train them. This paper answered that question — with a single elegant algorithm that is still the beating heart of every neural network trained today.

    Read
  3. 3
    04
    Long Short-Term Memory
    Sepp Hochreiter, Jürgen Schmidhuber 1997 Intermediate
    Read
  4. 4
    05
    Efficient Estimation of Word Representations in Vector Space (Word2Vec)
    Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean 2013 Intermediate
    Read
  5. 5
    06
    Sequence to Sequence Learning with Neural Networks
    Ilya Sutskever, Oriol Vinyals, Quoc V. Le 2014 Intermediate
    Read
  6. 6
    07
    Neural Machine Translation by Jointly Learning to Align and Translate
    Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio 2014 Intermediate
    Read
  7. 7
    08
    Attention Is All You Need
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Łukasz Kaiser, Illia Polosukhin 2017 Intermediate
    Read
  8. 8
    10
    Improving Language Understanding by Generative Pre-Training
    Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever 2018 Intermediate
    Read
  9. 9
    11
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova 2018 Intermediate
    Read
  10. 10
    12
    Language Models are Few-Shot Learners
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei 2020 Intermediate
    Read
  11. 11
    13
    Scaling Laws for Neural Language Models
    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Aditya Ramesh, Prafulla Dhariwal, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei 2020 Intermediate
    Read
  12. 12
    15
    Training Language Models to Follow Instructions with Human Feedback
    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelley, Emma Coleman, Brennan Zoph, Amanda Askell, Solal Picciotto, Ariel Herbert-Voss, Jeff Engstrom, Christopher Olah, Gretchen Krueger, Ryan Felsher, Timothy Telleen-Lawton, Tom Conerly, Tamera Lanham, Karina Nguyen, Todd Henighan, Saurav Kadavath, Nick Joseph, Tom Brown, Jack Clark, Dawn Song, Dario Amodei, Ilya Sutskever, Paul Christiano, Sam Altman 2022 Intermediate
    Read

Want to go deeper? Browse all 24 papers or explore the math behind them.