Attention for Neural Networks

NOTE: This StatQuest was supported by these awesome people who support StatQuest at the Double BAM level: S. Ágoston, M. Steenbergen, P. Keener, A. Rabatin, Alex, S. Kundapurkar, JWC, S. Jeffcoat, S. Handschuh, J. Le, D. Greene, D. Schioberg, Magpie, Z. Rosenberg, J. N., H-M Chang, M. Ayoubieh, S. Kundapurkar, Losings, F. Pedemonte, S. Song US, A. Tolkachev, L. Cisterna, J. Alexander

3 thoughts on “Attention for Neural Networks

  1. Hello sir
    I just wanted to propose if possible that you make detailed code videos with your outstanding explanation to go hand and hand with the video yo make.

  2. Attention is All you Need ! Great work Joshua Starmer. Can you kindly make videos on How Attention is core of GPT and BERT models? In particular , How ChatGPT 4 currently uses decoder only transformers to predict next words. That would be great, I feel.

Leave a Reply

Your email address will not be published. Required fields are marked *