Thanks for catching that. There was something odd with including the first line of the output from the summary() command. Once I deleted that line (which was commented out), the original code came back. Strange! But I’m very grateful you spotted the error before it became a problem for other people. Thanks!
Try reloading the page and see what happens. It should work – there might be something funky going on with your cache. If it doesn’t work, try this link: https://youtu.be/C4N3_XJJ-jU
Hi Josh,
Excellent videos. Your explanation has helped me grasp how to perform logistic regression in R. I was wondering whether you could demonstrate how to put the data in a bar graph with 95% confidence intervals, like is done in academic papers.
I can’t recreate that error. Are you sure you created “predicted.data” correctly?
predicted.data <- data.frame(
probability.of.hd=logistic$fitted.values,
hd=data$hd)
I can’t recreate that error. Can you try running the code from the start and see if it happens again. Your error suggests that you might have skipped the line when we removed samples with “NA” in them:
data <- data[!(is.na(data$ca) | is.na(data$thal)),]
A) May I know if I’ve had to remove all NA in the data set? Or only in some if the variables?
B) Do we have to do a randomization check before we remove the NA? We have more than 10,000 samples with NA.
C) for the formula:
predicted.data <- data.frame(
probability.of.hd=logistic$fitted.values,
Can I check with you if I were to apply to my model, are the following correct?
hd=data$hd)
A) Logistics: my model name
B) data: mydata (my data set)
C) hd: my response Y
D) fitted.values: keep it as fitness.values
D) Lastly I’m trying to do a confusion matrix of the variables, eg Sex & hd. Beside using xtabs, how to we check the Log odds ratio & p-value?
Pls share with us the r-script to find the better variables.
Hi Josh! Thank you for the amazing video, it helped clarifying quite a few things!
I have a question: the ggplot of the model I made doesn’t have an S shape but rather it looks like a half arch, starting from the bottom left corner and ending in the top left. Any suggestions on why that happened? In general the model has quite a low R^2 (0.2357887) but also a very small p-val (7.886781e-08).
Hi Josh, This video is exactly what I need right now! Unfortunately when I try and access the data it says page not found. Is there another way to access the data so I can follow along?
Thanks!
Unfortunately wordpress, which hosts StatQuest, will not allow me to upload text files or CSV files, so I can’t put the data up here. However, if you try again, I bet you can get the data now. If not, try again in an hour.
line 142 try:
logistic <- glm(hd ~ sex, data=data, family="binomial")
summary(logistic)
Thanks for catching that. There was something odd with including the first line of the output from the summary() command. Once I deleted that line (which was commented out), the original code came back. Strange! But I’m very grateful you spotted the error before it became a problem for other people. Thanks!
Hi Josh, just let you know there is a problem with this video. Many thanks for your great website.
Cheers!
What problem are you having? I just played it back and it seemed fine.
It says that the video is unavailable!
Try reloading the page and see what happens. It should work – there might be something funky going on with your cache. If it doesn’t work, try this link: https://youtu.be/C4N3_XJJ-jU
I am not able to get the data. So to begin learning is impossible.
It appears that the website that hosts the data is down right now. Try again tomorrow. I’m sure they will get the site running again soon.
Hi Josh,
Excellent videos. Your explanation has helped me grasp how to perform logistic regression in R. I was wondering whether you could demonstrate how to put the data in a bar graph with 95% confidence intervals, like is done in academic papers.
That’s a great idea! I’ll put it on the to-do list.
This is a great idea! I’ll put it on the to-do list.
the ggplot is not able to plot the output.
Error in FUN(X[[i]], …) : object ‘hd’ not found
I can’t recreate that error. Are you sure you created “predicted.data” correctly?
predicted.data <- data.frame(
probability.of.hd=logistic$fitted.values,
hd=data$hd)
Josh, when I run the following script, there is an error:
> predicted.data <- data.frame(
+ probability.of.hd=logistic$fitted.values,
+ hd=data$hd)
Error in data.frame(probability.of.hd = logistic$fitted.values, hd = data$hd) :
arguments imply differing number of rows: 297, 303
How can we resolve it?
Thanks!
Awesome. Video and R script has really helped me a lot.
I can’t recreate that error. Can you try running the code from the start and see if it happens again. Your error suggests that you might have skipped the line when we removed samples with “NA” in them:
data <- data[!(is.na(data$ca) | is.na(data$thal)),]
Thanks for your prompt reply, Josh! I’ll let you know when I try it tonight.
Hi Josh,
The ggplot works well!
A) May I know if I’ve had to remove all NA in the data set? Or only in some if the variables?
B) Do we have to do a randomization check before we remove the NA? We have more than 10,000 samples with NA.
C) for the formula:
predicted.data <- data.frame(
probability.of.hd=logistic$fitted.values,
Can I check with you if I were to apply to my model, are the following correct?
hd=data$hd)
A) Logistics: my model name
B) data: mydata (my data set)
C) hd: my response Y
D) fitted.values: keep it as fitness.values
D) Lastly I’m trying to do a confusion matrix of the variables, eg Sex & hd. Beside using xtabs, how to we check the Log odds ratio & p-value?
Pls share with us the r-script to find the better variables.
Thanks for your kind assistance Josh! 👍🏼😊
Josh,
Appreciate if you can enlighten me on my queries. Thanks!
I’d love to help, but I’m too busy with my work and my next StatQuest video. I hope you understand.
Hi Josh! Thank you for the amazing video, it helped clarifying quite a few things!
I have a question: the ggplot of the model I made doesn’t have an S shape but rather it looks like a half arch, starting from the bottom left corner and ending in the top left. Any suggestions on why that happened? In general the model has quite a low R^2 (0.2357887) but also a very small p-val (7.886781e-08).
Thanks and cheers,
Ginevra
Weird! I have no idea what might be causing that.
Hi Josh, This video is exactly what I need right now! Unfortunately when I try and access the data it says page not found. Is there another way to access the data so I can follow along?
Thanks!
Unfortunately wordpress, which hosts StatQuest, will not allow me to upload text files or CSV files, so I can’t put the data up here. However, if you try again, I bet you can get the data now. If not, try again in an hour.