StatQuest: Linear Discriminant Analysis (LDA), clearly explained

July 10, 2016 LDA, Linear Discriminant Analysis, Machine Learning, PCA, Principal Component Analysis, RNA-seq, statistics

Here it is, folks! By popular demand, a StatQuest on linear discriminant analysis (LDA)!

Also, because you asked for it, here’s some sample R code that shows you how to get LDA working in R.

If all went well, you should get a graph that looks like this:

8 thoughts on “StatQuest: Linear Discriminant Analysis (LDA), clearly explained”

Amber

June 11, 2017 at 8:52 am

Hello friend,
Thank you for your helpful video. Could you please provide me the code to run this analysis in R?
Best regards
Amber

Reply
Madeleine

September 15, 2017 at 11:57 am

Hi Joshua! Thanks a lot for your very helpful video!!

I have a little question for you…
I was thinking about how to find the variables that mainly contribute to the formation of the new axes. How do you concretely do that?
How would you correlate LD1 (coefficients of linear discriminants) with the variables?

Thanks in advance,
best
Madeleine

Reply
- Josh
  
  September 15, 2017 at 12:53 pm
  
  Madeleine,
  I use R, so here’s how to do it in R. First do the LDA…
  
  library(MASS) ## Load the “MASS” package (which contains the lda() function)
  data(iris) ## load an example dataset
  head(iris, 3) ## look at the first 3 rows..
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  1 5.1 3.5 1.4 0.2 setosa
  2 4.9 3.0 1.4 0.2 setosa
  3 4.7 3.2 1.3 0.2 setosa
  
  lda.results <- lda(formula = Species ~ ., data = iris) # now do LDA
  lda.results$scaling # now look at the the linear combination of coefficients
  LD1 LD2
  Sepal.Length 0.8293776 0.02410215
  Sepal.Width 1.5344731 2.16452123
  Petal.Length -2.2012117 -0.93192121
  Petal.Width -2.8104603 2.83918785
  
  … roughly speaking, the absolute value of the "scaling" values will tell you which variables were the most important for each linear discriminant.
  
  If you want to know how much variation each linear discriminate accounts for…
  
  lda.results$svd^2/sum(lda.results$svd^2)
  
  Reply
  - Madeleine
    
    September 15, 2017 at 1:01 pm
    
    Hi Josh,
    
    thanks for your answer!
    the higher the absolute value is the most the variable contributes to the groups/categories separation?
    
    Do you think that one can “filtering variables” using the ones with highest “scaling” values and recompute lda? …or lda can not really be used as variable selection?
    
    sorry, I’m wandering a bit off.
    
    thanks in advance
    best,
    Madeleine
    
    Reply
Josh

September 15, 2017 at 1:04 pm

Yes, I think you can use LDA iteratively to filter out variables that are not helpful. Just like when you do a regression with a ton of variables and then leave some out and see if the sums of squares are still pretty much in your favor.

Reply
Madeleine

September 15, 2017 at 1:06 pm

Great!

Thank you very much 🙂

M

Reply
Carlos Barron

June 27, 2019 at 5:33 pm

Hello Josh, I am traying to run the code for my own data but when I tried to my data frame for building a graph, when I put Y as Y = data.lda.values$x[,2] there is an error:
subscript out of bounds

so in data.lda.values$x there is only 1 lda. Why I only get 1 lda?

Reply
- Josh
  
  June 27, 2019 at 5:41 pm
  
  To have more 2 or more axes in your final LDA graph, you need to have 3 more more variables that you are using to separate your samples.
  
  Reply

StatQuest!!!

An epic journey through statistics and machine learning

StatQuest: Linear Discriminant Analysis (LDA), clearly explained

8 thoughts on “StatQuest: Linear Discriminant Analysis (LDA), clearly explained”

Leave a Reply Cancel reply