StatQuest Errata

Although I do everything I can to catch errors before I publish a video, sometimes they slip through. Here’s a list of typos that I know about:

AdaBoost, Clearly Explained:

  • Error at 10:18: Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25, not 0.42.
  • Error at 10:18: The math has 7/8 in the numerator and it should have 5/8 

Gradient Boost Part 1: Regression Main Ideas:

  • Error at 10:53. I have (76 – 71.2 + (0.1 x 4.8)) but I should have  (76 – (71.2 + 0.1 x 4.8))

Gradient Boost, Part 2: Regression Details

  • At around 24:15, the header for the residual column should be ri,2 and the residuals need to be updated.
  • At around 16:18, the leaf in the script is R1,2, but it should be R2,1

Gradient Boost Part 3 – Classification:

  • Error at 11:53, the residual for this sample and the next should be (1 – 0.9) = 0.1, not (0 – 0.9) = 0.1. At least the result is correct, if calculation itself is not.

Gradient Boost Part 4 – Classification Details:

  • Error at 7:01: I have log(p) – log(1-p) = log(p)/log(1-p), which is incorrect. It should be log(p) – log(1-p) = log(p/(1-p))
  • Error at 19:10. The “loss functions” in the derivatives are missing the “L” in front of them.

Gradient Descent, Step-by-Step:

  • Error at 18:53 and beyond – the two parts of the derivative, with respect to the slope, that I did not explicitly derive have an extra “square” at the end of them. You can also see these at 19:11.
  • At around 14:11, I need to make it clear that it’s the absolute value of the step size that we need to have being small. Just make the slope -0.009, instead of positive, and this point will be more obvious.

Logistic Regression Details Pt1: Coefficients

  • At 15:21, the left hand side of the equation should be “log(odds Obesity)” instead of “size”.

Maximum Likelihood For the Normal Distribution, step-by-step!

  • Error at 2:39: I said likelihood=0.03 for mu=30, but mu=28 is in the equation.

PCA Step-By-Step:

  • At 1:47, points 5 and 6 are not in the right location.

Random Forests Part 1:

  •  The same feature (or variable) can be selected multiple times in a tree. Every time we select a subset of features to choose from, we choose from the full list of features, even if we have already used some of those features. Thus, a single feature can appear multiple times in a tree.
  •  At 9:28 I say “square” when I meant to say “square root”.

StatQuest: Decision Trees:

  • At 12:43, the Gini Impurity for Chest Pain is 0.29, but it should be 0.19. This, then, could possibly change whether or not to continue to split the leaf on the right side of the “Blocked” node.
  • At 14:41, the Gini Impurity for the right-hand side is should be (4/(1+4))*0.375, but it is (4/(1+4))*0.336

StatQuest: DESeq2, part 1, Library Normalization:

  •  At 9:28, I have log(reads for gene X) – log(average for gene X), but it should be: log(reads for gene X) – average(log values for gene for gene X). We are subtracting the geometric mean from each gene measurement.

Statistics Fundamentals: Population Parameters:

  • Error at 2:10: I should say between 10 and 30 and not 20 and 30…

StatQuest: MDS and PCoA:

  • Error at 4:14. The difference for gene 3 should be (2.2 – 1)². Instead the distance for gene 2 was repeated.

StatQuest: RPKM, FPKM and TPM:

  • Error at 0:24: It says “RPKM vs FPKM vs FPM” and not “RPKM vs FPKM vs TPM”.

StatQuest: The Standard Error:

  • Error at 10:22. The bar that represents the mean is in the wrong location. It should be at -0.3.

StatQuest: Quantile-Quantile Plots (QQ plots), Clearly Explained:

  • Error at 5:30. I should say that Quartiles divide the data into 4 parts.

Stochastic Gradient Descent, Clearly Explained:

  • Error at 9:10: The derivatives on the left side of the screen contain the original slope and intercept (slope = 1, intercept = 0) instead of the “latest” values. I would just eyeball the graph on the right to figure out what the “latest” values are.