StatQuest: Logistic Regression in R

July 23, 2018

Here’s a link to the source code on the StatQuest GitHub.

25 thoughts on “StatQuest: Logistic Regression in R”

Statistics Student

July 24, 2018 at 9:58 am

line 142 try:
logistic <- glm(hd ~ sex, data=data, family="binomial")
summary(logistic)

Reply
- Josh
  
  July 24, 2018 at 12:50 pm
  
  Thanks for catching that. There was something odd with including the first line of the output from the summary() command. Once I deleted that line (which was commented out), the original code came back. Strange! But I’m very grateful you spotted the error before it became a problem for other people. Thanks!
  
  Reply
Ham

September 10, 2018 at 10:09 am

Hi Josh, just let you know there is a problem with this video. Many thanks for your great website.

Cheers!

Reply
- Josh
  
  September 10, 2018 at 11:21 am
  
  What problem are you having? I just played it back and it seemed fine.
  
  Reply
Ham

September 10, 2018 at 12:26 pm

It says that the video is unavailable!

Reply
- Josh
  
  September 10, 2018 at 2:52 pm
  
  Try reloading the page and see what happens. It should work – there might be something funky going on with your cache. If it doesn’t work, try this link: https://youtu.be/C4N3_XJJ-jU
  
  Reply
ambrish dhaka

September 18, 2018 at 1:49 pm

I am not able to get the data. So to begin learning is impossible.

Reply
- Josh
  
  September 18, 2018 at 2:04 pm
  
  It appears that the website that hosts the data is down right now. Try again tomorrow. I’m sure they will get the site running again soon.
  
  Reply
Brian

December 12, 2018 at 11:18 pm

Hi Josh,
Excellent videos. Your explanation has helped me grasp how to perform logistic regression in R. I was wondering whether you could demonstrate how to put the data in a bar graph with 95% confidence intervals, like is done in academic papers.

Reply
- Josh
  
  December 13, 2018 at 1:28 am
  
  That’s a great idea! I’ll put it on the to-do list.
  
  Reply
- Josh
  
  December 13, 2018 at 1:31 am
  
  This is a great idea! I’ll put it on the to-do list.
  
  Reply
Stanley Chen

March 11, 2019 at 10:40 pm

the ggplot is not able to plot the output.
Error in FUN(X[[i]], …) : object ‘hd’ not found

Reply
- Josh
  
  March 15, 2019 at 8:50 pm
  
  I can’t recreate that error. Are you sure you created “predicted.data” correctly?
  predicted.data <- data.frame(
  probability.of.hd=logistic$fitted.values,
  hd=data$hd)
  
  Reply
  - Stanley Chen
    
    April 21, 2019 at 4:20 am
    
    Josh, when I run the following script, there is an error:
    
    > predicted.data <- data.frame(
    + probability.of.hd=logistic$fitted.values,
    + hd=data$hd)
    Error in data.frame(probability.of.hd = logistic$fitted.values, hd = data$hd) :
    arguments imply differing number of rows: 297, 303
    
    How can we resolve it?
    Thanks!
    
    Reply
Parupalli Srinivasa Dinesh

March 27, 2019 at 4:47 am

Awesome. Video and R script has really helped me a lot.

Reply
Josh

April 21, 2019 at 11:47 am

I can’t recreate that error. Can you try running the code from the start and see if it happens again. Your error suggests that you might have skipped the line when we removed samples with “NA” in them:
data <- data[!(is.na(data$ca) | is.na(data$thal)),]

Reply
- Stanley Chen
  
  April 22, 2019 at 5:28 am
  
  Thanks for your prompt reply, Josh! I’ll let you know when I try it tonight.
  
  Reply
  - Stanley Chen
    
    April 23, 2019 at 1:14 am
    
    Hi Josh,
    
    The ggplot works well!
    
    A) May I know if I’ve had to remove all NA in the data set? Or only in some if the variables?
    
    B) Do we have to do a randomization check before we remove the NA? We have more than 10,000 samples with NA.
    
    C) for the formula:
    predicted.data <- data.frame(
    probability.of.hd=logistic$fitted.values,
    
    Can I check with you if I were to apply to my model, are the following correct?
    hd=data$hd)
    A) Logistics: my model name
    B) data: mydata (my data set)
    C) hd: my response Y
    D) fitted.values: keep it as fitness.values
    
    D) Lastly I’m trying to do a confusion matrix of the variables, eg Sex & hd. Beside using xtabs, how to we check the Log odds ratio & p-value?
    
    Pls share with us the r-script to find the better variables.
    
    Thanks for your kind assistance Josh! 👍🏼😊
    
    Reply
Stanley Chen

April 24, 2019 at 11:04 pm

Josh,

Appreciate if you can enlighten me on my queries. Thanks!

Reply
- Josh
  
  April 25, 2019 at 1:57 am
  
  I’d love to help, but I’m too busy with my work and my next StatQuest video. I hope you understand.
  
  Reply
Ginevra

June 4, 2019 at 12:40 am

Hi Josh! Thank you for the amazing video, it helped clarifying quite a few things!
I have a question: the ggplot of the model I made doesn’t have an S shape but rather it looks like a half arch, starting from the bottom left corner and ending in the top left. Any suggestions on why that happened? In general the model has quite a low R^2 (0.2357887) but also a very small p-val (7.886781e-08).

Thanks and cheers,
Ginevra

Reply
- Josh
  
  June 4, 2019 at 1:31 am
  
  Weird! I have no idea what might be causing that.
  
  Reply
Annie

July 3, 2019 at 3:50 pm

Hi Josh, This video is exactly what I need right now! Unfortunately when I try and access the data it says page not found. Is there another way to access the data so I can follow along?
Thanks!

Reply
- Josh
  
  July 3, 2019 at 4:52 pm
  
  Unfortunately wordpress, which hosts StatQuest, will not allow me to upload text files or CSV files, so I can’t put the data up here. However, if you try again, I bet you can get the data now. If not, try again in an hour.
  
  Reply
Pingback: TSM_BusAn | Andreas' Blog

StatQuest!!!

An epic journey through statistics and machine learning

StatQuest: Logistic Regression in R

25 thoughts on “StatQuest: Logistic Regression in R”

Leave a Reply Cancel reply