Entropy (for data science) Clearly Explained

August 25, 2021August 25, 2021

NOTE: This StatQuest was supported by these awesome people who support StatQuest at the Double BAM level: Z. Rosenberg, S. Shah, J. N., J. Horn, J. Wong, I. Galic, H-S. Ming, D. Greene, D. Schioberg, C. Walker, G. Singh, L. Cisterna, J. Alexander, J. Varghese, K. Manickam, N. Fleming, F. Prado, J. Malone-Lee

7 thoughts on “Entropy (for data science) Clearly Explained”

Animesh

August 25, 2021 at 2:24 pm

Send me the note in pdf

Reply
Animesh

August 25, 2021 at 2:25 pm

Send me the note on entropy in pdf.

Reply
Ferdinand

August 30, 2021 at 10:36 am

You are one of the geniuses in teaching I have ever come across.

Reply
Youness El Hamzaoui

October 23, 2021 at 5:36 pm

Thank you very much Josh!

Reply
Ross

March 12, 2022 at 12:33 am

Hi Josh,
I am dissatisfied with your explanation for surprise being log(1/p) on the basis that it is 0 when p is 1 and infinity when p is 0. There are a zillion functions of p with those limits. Why is log(1/p) preferred over all of the others? Is it just a convention chosen to match the physics definition and because of the convenient mathematical properties of the log? Or is there something more to it?

Reply
- starmer
  
  March 12, 2022 at 2:19 pm
  
  If you want a more mathematically grounded explanation, I would highly recommend the original manuscript by Shannon: https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
  
  Reply
- Anish Chelliah CR
  
  July 19, 2022 at 12:26 pm
  
  I believe it might have something to do with simple calculations to compute gradients when softmax is paired with the cross-entropy loss.
  
  Reply