StatQuest Errata

Although I do everything I can to catch errors before I publish a video, sometimes they slip through. Here’s a list of typos that I know about:

AdaBoost, Clearly Explained:

  • Error at 10:18: Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25, not 0.42.
  • Error at 10:18: The math has 7/8 in the numerator and it should have 5/8 

Gradient Boost Part 1: Regression Main Ideas:

  • Error at 10:53. I have (76 – 71.2 + (0.1 x 4.8)) but I should have  (76 – (71.2 + 0.1 x 4.8))

Gradient Boost, Part 2: Regression Details

  • At around 24:15, the header for the residual column should be ri,2 and the residuals need to be updated.
  • At around 16:18, the leaf in the script is R1,2, but it should be R2,1

Gradient Boost Part 3 – Classification:

  • Error at 11:53, the residual for this sample and the next should be (1 – 0.9) = 0.1, not (0 – 0.9) = 0.1. At least the result is correct, if calculation itself is not.

Gradient Boost Part 4 – Classification Details:

  • Error at 7:01: I have log(p) – log(1-p) = log(p)/log(1-p), which is incorrect. It should be log(p) – log(1-p) = log(p/(1-p))
  • Error at 19:10. The “loss functions” in the derivatives are missing the “L” in front of them.

Gradient Descent, Step-by-Step:

  • Error at 18:53 and beyond – the two parts of the derivative, with respect to the slope, that I did not explicitly derive have an extra “square” at the end of them. You can also see these at 19:11.
  • At around 14:11, I need to make it clear that it’s the absolute value of the step size that we need to have being small. Just make the slope -0.009, instead of positive, and this point will be more obvious.

Logistic Regression Details Pt1: Coefficients

  • At 15:21, the left hand side of the equation should be “log(odds Obesity)” instead of “size”.

Machine Learning Fundamentals: Cross Validation

  • At 4:16 there is a small typo. KNN should have 10 correct and 14 incorrect.

Maximum Likelihood For the Normal Distribution, step-by-step!

PCA Step-By-Step:

  • At 1:47, points 5 and 6 are not in the right location.
  • Error at 2:39: I said likelihood=0.03 for mu=30, but mu=28 is in the equation.

Random Forests Part 1:

  •  The same feature (or variable) can be selected multiple times in a tree. Every time we select a subset of features to choose from, we choose from the full list of features, even if we have already used some of those features. Thus, a single feature can appear multiple times in a tree.
  •  At 9:28 I say “square” when I meant to say “square root”.

Random Forests Part 2:

  • At 10:22 I overlooked one step. In this case, you plug in the most common value/median value for all observations in the training dataset that have that same category as the new copy that you created. For example, we created two new copies of the observation: one with with heart disease and one without heart disease. Now, for the new copy with heart disease, we plug in the most common value from the observations in the training dataset that have heart disease. For the new copy without heart disease, we plug in the most common value from the observations in the training dataset that do not have heart disease. We can then use the iterative method to refine the guess if we want, or we can just run those two copies down the tree and use the classification from the copy that got the most correct votes.

Regularization Part 1: Ridge Regression

  • At 13:39 there is a typo. I meant to put “Negative Log-Likelihood” instead of “Likelihood”.

StatQuest: Decision Trees:

  • At 12:43, the Gini Impurity for Chest Pain is 0.29, but it should be 0.19. This, then, could possibly change whether or not to continue to split the leaf on the right side of the “Blocked” node.
  • At 14:41, the Gini Impurity for the right-hand side is should be (4/(1+4))*0.375, but it is (4/(1+4))*0.336

StatQuest: DESeq2, part 1, Library Normalization:

  •  At 9:28, I have log(reads for gene X) – log(average for gene X), but it should be: log(reads for gene X) – average(log values for gene for gene X). We are subtracting the geometric mean from each gene measurement.

Statistics Fundamentals: Population Parameters:

  • Error at 2:10: I should say between 10 and 30 and not 20 and 30…

StatQuest: K-means clustering:

  • At 7:25 I made a mistake when I plotted the point (7,-8). It should be in the lower right-hand quadrant.

StatQuest: Linear Models Pt. 1 – Linear Regression

  • At 25:39 there is a typo. I should have (Pfit – Pmean) instead of the other way around.

StatQuest: MDS and PCoA:

  • Error at 4:14. The difference for gene 3 should be (2.2 – 1)². Instead the distance for gene 2 was repeated.

StatQuest: Random Forests in R

  • Error at 13:26. I meant to call “as.dist()” instead of “dist()”.

StatQuest: RPKM, FPKM and TPM:

  • Error at 0:24: It says “RPKM vs FPKM vs FPM” and not “RPKM vs FPKM vs TPM”.

StatQuest: The Standard Error:

  • Error at 10:22. The bar that represents the mean is in the wrong location. It should be at -0.3.

StatQuest: t-SNE, Clearly Explained

  • At 6:17 I should have said that the blue points have twice the density of the purple points.

StatQuest: Quantile-Quantile Plots (QQ plots), Clearly Explained:

  • Error at 4:35. The uniform distribution should be split into 15 quantiles, not 16.
  • Error at 5:30. I should say that Quartiles divide the data into 4 parts.

Stochastic Gradient Descent, Clearly Explained:

  • Error at 9:10: The derivatives on the left side of the screen contain the original slope and intercept (slope = 1, intercept = 0) instead of the “latest” values. I would just eyeball the graph on the right to figure out what the “latest” values are.