This page contains links to playlists and individual videos on Statistics, Statistical Tests, Machine Learning, The StatQuest Musical Dictionary, Webinars, Live Streams, and The AI Buzz, organized, roughly, by category. Generally speaking, the videos are organized from basic concepts to complicated concepts, so, in theory, you should be able to start at the top and work you way down and everything will make sense.
Playlists:
- Statistics Fundamentals – These videos give you a general overview of statistics as well as a be a reference for statistical concepts. Topics include:
- Histograms
- What is a statistical distribution?
- And many more!!!
- Linear Regression and Linear Models – These videos teach the basics relating to one of statistics most powerful tools. Linear Regression and Linear Models allow us to use continuous values, like weight or height, and categorical values, like favorite color or favorite movie, to predict a continuous value, like age.
- Logistic Regression – These videos pick up where Linear Regression and Linear Models leave off. Now, instead of predicting something continuous, like age, we can predict something discrete, like whether or not someone will enjoy the 1990 theatrical bust Troll 2.
- Machine Learning – Linear Models and Logistic Regression are just the tips of the machine learning iceberg. There’s tons more to learn, and this playlist will help you trough it all, one step at a time.
- Neural Networks, Deep Learning, and AI – Everything you need to know, from the basics, to image classification with Convolutional Neural Networks, to the state of the art transformers used for large language models like ChatGPT, presented one step at a time so that it is easily understood.
- High Throughput Sequence Analysis – If you do high-throughput sequence analysis, this playlist is for you!
- Statistics in R – If you want to do any of this stuff in R, this playlist is for you, and you only. No one else is allowed to watch it.
- #66DaysOfData – If you want to participate in Ken Jee’s #66DaysOfData and are having trouble thinking of new stuff to learn, here’s a playlist that covers everything from the basics to the fancy stuff.
- The StatQuest Musical Dictionary – Short songs to help you remember all of the data science terminology.
- Human Stories in AI with StatQuest and Lighting AI is about the career journeys of passionate AI experts. From their humble beginnings to conquered challenges, we’ll be inspired by the real-world experiences of professionals thriving in the ever-evolving AI landscape.
- The AI Buzz – A conversation between me and Luca Antiga (and guests) about AI and how it works and how it is changing everything.
- Histograms, Clearly Explained
- How to tell a story with US Census Data
- The Main Ideas behind Probability Distributions
- The Normal Distribution
- The Exponential Distribution Sing-a-long!!!
- Population and Estimated Parameters, Clearly Explained
- Estimating the Mean, Variance and Standard Deviation
- What is a (mathematical) model?
- Hypothesis Testing and the Null Hypothesis
- Alternative Hypothesis: Main Ideas
- p-values: What they are and how to interpret them
- How to Calculate p-values
- p-hacking: What it is and how to avoid it
- False Discovery Rate (FDR), Clearly Explained
- Statistical Power, Clearly Explained
- Power Analysis, Clearly Explained
- Covariance, Clearly Explained
- Pearson’s Correlation, Clearly Explained
- What does it mean to “sample from a distribution”?
- Three more lessons from my Pop!!!
- Conditional Probability, Clearly Explained
- Bayes’ Theorem, Clearly Explained
- Expected Values Part 1, Main Ideas!!! (Expected Values for Discrete Variables)
- Expected Values Part 2, Continuous Variables
- The Binomial Distribution and Test
- The Central Limit Theorem (or “How I Learned to Stop Worrying and Love the t-test”)
- The Difference between Technical and Biological Replicates
- The sample size and the effective sample size
- Standard Deviation vs Standard Error
- The Standard Error
- Bootstrapping Part 1: Main Ideas
- Bootstrapping Part 2: Calculating p-values
- Bar Charts Are Better Than Pie Charts
- Boxplots, Clearly Explained
- Logs (logarithms), clearly explained
- How to make your own StatQuest!!!
- Confidence Intervals
- R-squared explained
- Linear Models Part 0: Fitting a line to data, aka Least Squares, aka Linear Regression
- Fitting a curve to data, aka Lowess, aka Loess
- Linear Models Part 1: Linear Regression
- Linear Models: Linear Regression in R
- Linear Models Part 1.5: Multiple Regression
- Linear Models: Multiple Regression in R
- Linear Models Part 2: t-tests and ANOVA
- Linear Models Part 3: Design Matrices
- Linear Models: Design Matrix Examples in R
- Quantiles and Percentiles
- Quantile-Quantile Plots (QQ Plots)
- Quantile Normalization
- Probability vs Likelihood
- Maximum Likelihood
- Maximum Likelihood: A worked out example for the exponential distribution
- Maximum Likelihood: A worked out example for the binomial distribution
- Maximum Likelihood: A worked out example for the normal distribution
- The Ukulele, Clearly Explained
- Odds and Log(Odds)
- Odds Ratios and Log(Odds Ratios)
- Enrichment Analysis using Fisher’s Exact Test and the Hypergeometric Distribution
- Which t-test to use
- p-values: What they are and how to interpret them
- How to Calculate p-values
- Thresholds for Significance
- FDR and the Benjamini-Hochberg Method clearly explained
- p-hacking and power calculations
Machine Learning and Dealing with large datasets that have lots and lots of measurements per sample:
(NOTE: All of the linear model and curve fitting stuff in the “Basics” section is also considered Machine Learning, so make sure you check out those videos).
- A Gentle Introduction to Machine Learning
- Machine Learning Fundamentals: Cross Validation
- Machine Learning Fundamentals: The Confusion Matrix
- Machine Learning Fundamentals: Sensitivity and Specificity
- The Sensitivity, Specificity, Precision, Recall Sing-a-Long!!!
- Machine Learning Fundamentals: Bias and Variance
- ROC and AUC
- ROC and AUC in R
- Entropy, Clearly Explained!!!
- Mutual Information, Clearly Explained!!!
- Regularization Part 1: L2, Ridge Regression
- Regularization Part 2: L1, Lasso Regression
- Regularization Part 2.5: Ridge vs Lasso Visualized (or why Lasso can set parameters to 0 and Ridge can’t)
- Regularization Part 3: Elastic-Net Regression
- Regularization Part 4: Ridge, Lasso and Elastic-Net Regression in R
- Linear Discriminant Analysis (LDA) clearly explained
- Principal Component Analysis (PCA) Step-by-Step
- Principal Component Analysis (PCA) explained in less than 5 minutes
- PCA – Practical Tips
- DEPRECATED: Principal Component Analysis (PCA) clearly explained (more details)
- PCA in R
- PCA in Python
- BAM!!! Clearly Explained
- Multi-Dimensional Scaling (MDS) and Principal Coordinate Analysis (PCoA) clearly explained
- MDS and PCoA in R
- t-SNE, clearly explained
- UMAP Dimension Reduction: Part 1 – Main Ideas
- UMAP Dimension Reduction: Part 2 – Mathematical Details
- Heatmaps – considerations for drawing and interpreting them
- Hierarchical Clustering
- K-Means Clustering
- Clustering with DBSCAN
- K-Nearest Neighbors
- Naive Bayes
- Study Guide
- NOTE: This topic is covered The StatQuest Illustrated Guide to Machine Learning
- Gaussian Naive Bayes
- Study Guide
- NOTE: This topic is covered The StatQuest Illustrated Guide to Machine Learning
- The Chain Rule
- Gradient Descent
- Study Guide
- NOTE: This topic is covered The StatQuest Illustrated Guide to Machine Learning
- Stochastic Gradient Descent
- One-Hot, Label, Target and K-Fold Target Encoding, Clearly Explained!!!
- CART – Classification and Regression Trees are explained in the following three videos:
- Decision and Classification Trees, Clearly Explained!!!
- Study Guide
- NOTE: This topic is covered The StatQuest Illustrated Guide to Machine Learning
- Decision Trees Part 2: Feature Selection and Missing Data
- Regression Trees
- How to Prune Trees (Cost Complexity Pruning)
- Classification Trees in Python, from Start-to-Finish
- Decision and Classification Trees, Clearly Explained!!!
- Random Forests Part 1: Building, using and evaluating
- Random Forests Part 2: Missing data and clustering
- Random Forests in R
- AdaBoost
- Three (3) things to do when starting out in Data Science
- Gradient Boost Part 1: Regression Main Ideas
- Gradient Boost Part 2: Regression Details
- Gradient Boost Part 3: Classification Main Ideas
- Gradient Boost Part 4: Classification Details
- Troll 2, Clearly Explained!!!
- XGBoost Part 1: Regression
- XGBoost Part 2: Classification
- XGBoost Part 3: Mathematical Details
- XGBoost Part 4: Crazy Cool Optimizations
- XGBoost in Python, from Start-to-Finish
- Cosine Similarity
- CatBoost Part 1: Ordered Target Encoding
- CatBoost Part 2: Building and Using Trees
- Support Vector Machines (SVM)
- Logistic Regression
- Logistic Regression, Details Part 1: Coefficients
- Logistic Regression, Details Part 2: Maximum Likelihood
- Logistic Regression, Details Part 3: R-squared and its p-value
- Saturated Models and Deviance Statistics
- Deviance Residuals
- Logistic Regression in R
- Neural Networks Part 0: Neural Networks are not Scary!!!
- Neural Networks Part 1: The Essential Main Ideas
- Neural Networks Part 2: Backpropagation Main Ideas
- Neural Networks Part 3: ReLU in Action!!!
- Neural Networks Part 4: Multiple Inputs and Outputs
- Neural Networks Part 5: ArgMax and SoftMax
- Neural Networks Part 6: Cross Entropy
- Neural Networks Part 7: Cross Entropy Derivatives and Backpropagation
- Neural Networks Part 8: Image Classification with Convolutional Neural Networks (CNNs)
- Neural Networks Part 9: Recurrent Neural Networks (RNNs)
- Neural Networks Part 10: Long Short-Term Memory (LSTM)
- Neural Networks Part 11: Word Embedding and Word2Vec
- Neural Networks Part 12: Encoder-Decoder (seq2seq) models
- Neural Networks Part 13: Attention
- Neural Networks Part 14: Transformers
- Neural Networks Part 15: Decoder-Only Transformers
- Tensors for Neural Networks, Clearly Explained!!!
- Essential Matrix Algebra for Neural Networks, Clearly Explained!!!
- The Matrix Math Behind Transformer Neural Networks, One Step at a Time!!!
- Coding Neural Networks
- The StatQuest Introduction to PyTorch
- Introduction to Coding Neural Networks with PyTorch + Lightning
- Coding a Neural Network with Multiple Inputs and Outputs
- Convolutional Neural Networks – Jupyter Notebook
- Long Short-Term Memory (LSTM) with PyTorch + Lightning
- Word Embedding in PyTorch + Lightning
- Coding a ChatGPT Like Transformer From Scratch in PyTorch + Lightning
- Silly Songs, Clearly Explained!!!
- How my pop influenced StatQuest
The StatQuest Musical Dictionary:
- Normalizing Your Data
- Standardizing Your Data
- Type 1 Errors (False Positive)
- Type 2 Errors (False Negative)
- Sensitivity, Specificity, Precision and Recall
- PCA Eigenvalues
- PCA Eigenvectors
- Matrix Notation: An m-by-n matrix
- Matrix Notation: Multiplying matrices
- Likelihood vs Probability
- p-values
- Logistic vs Logit Functions
- Logit (Log of the odds)
- Classification Trees in Python, from Start-to-Finish
- Support Vector Machines in Python, from Start-to-Finish
- XGBoost in Python, from Start-to-Finish
Human Stories in AI:
- Rick Marks
- Achal Dixit
- Khushi Jain
- Fabio Urbina
- Brian Risk
- Simon Stochholm
- Tommy Tang
- Xavier Moyá
- Amy Finnegan
- Episode #1: ChatGPT, Transformers and Attention.
- Episode #2: Big data, Reinforcement Learning and Aligning Models
- Episode #3: Constitutional AI, Emergent Abilities and Foundation Models
- Episode #4: ChatGPT + Bing and How to start an AI company in 3 easy steps.
- Episode #5: A new wave of AI-based products and the resurgence of personal applications
- 2020-01-06
- 2020-01-20
- 0:00 Introduction
- 1:04 Comment #1 – What is your favorite machine learning algorithm
- 4:40 Comment #2 – What is data leakage in machine learning?
- 8:39 Comment #3 – Where do you learn these nitty gritty details?
- 13:37 Live Question #1 – R-squared and Adjusted R-squared
- 17:23 Live Question #2 – How are the videos arranged on https://statquest.org/video-index/ (simple to complex)
- 18:26 Live Question #3 – Is it important to learn all of the formulas and equations even though we have advanced software that does the work?
- 2020-02-03
- 0:00 Silly Song and Introduction
- 0:18 A big huge announcement
- 3:14 Question #1 – Do we use statistical models to predict or explain stuff?
- 8:31 Question #2 – Can you show the effects of regularization?
- 9:42 My cat, Poe
- 15:04 Question #3 – How do I choose the best machine learning algorithm for my data?
- 21:17 Live Questions
- 2020-02-17
- 2020-03-02
- 2020-03-16 – Naive Bayes
- 2020-04-06 – Gaussian Naive Bayes
- 2020-04-20 – Expected Values (NOTE: There is now a full StatQuest video on Expected Values that revises and updates this material).
- 2020-05-04 – Conditional Probability
- 2020-05-18 – Bayes’ Theorem
- 2020-06-01 – Hypothesis Testing
- 2020-06-15 – Bootstrapping Main Ideas
- 2022-10-04 – One-Hot Encoding, Label Encoding and Target Encoding
- 2022-10-18 – Target Encoding without Leakage and CatBoost, Part 1
- 2022-11-01 – Cosine Similarity for Natural Language Processing (NLP) and CatBoost, Part 2
- 2024-05-10 – Luis Serrano and Josh Starmer, Episode #1
High-throughput Sequencing Analysis: