NOTE: This StatQuest was supported by these awesome people who support StatQuest at the Double BAM level: S. V. Dhulipala, Z. Rosenberg, T. Nguyen, J. Smith, G Heller-Wagner, J. N., S. Shah, H. M. Chang, S. Özdemir, J. Horn, S. Cahyawijaya, N.Fleming, R., A. Eng, F. Prado, J. Malone-Lee

Hi Josh – Thank you for making these awesome videos. These really helped me understand the foundations on Data Science especially on Deep Learning / Neural Network. I have a quick question on this video regarding the chain rule that was used at 10:45 of Neural Networks Part 2: Backpropagation Main Ideas video, why is the chain rule (d SSR / d b3) consists of two parts? One is (d SSR / d Predicted) and the other one is (d Predicted / d b3). And then we multiply them together.

I got confused because I initially watched your video on Gradient Descent step by step and when we do the chain rule on that video , it only consists of one part which is d SSR / d intercept –>> observed minus predicted.

The only difference I see is that on this video, the predicted = blue + orange + b3 while on the other video, the predicted = y = b0 + b1x

To get a better idea of how The Chain Rule is being applied here, check out the StatQuest on… The Chain Rule: https://youtu.be/wl1myxrtQHQ

Thank you!