An epic journey through statistics and machine learning
StathQuest: What is a Statistical Model?
5 thoughts on “StathQuest: What is a Statistical Model?”
The term “model selection” is interesting and related a bit to p-hacking mentioned in another video. If the F test rejects the null hypothesis, the model that “not all treatments are the same” is selected. Subsequent inference is then conditional on that model being the true model (recall we may make a mistake alpha*100% of the time by pure chance here). So when you construct your multiple comparisons or confidence intervals, you can’t just rely on the result “the treatments are different” as an a priori given, can you.
The standard approach is to do the ANOVA up front to reject the hypothesis that there are no differences between the groups. Then you do pairwise tests between the groups. To reduce the number of false positives in your data, you apply some p-value adjustment method to the p-values that came from the pairwise tests (you don’t have to go all the way back to the original ANOVA and adjust that p-value).
Yes. But that just deals with the multiple comparisons problem. There is also a problem in the time dimension, if you will, caused by the preliminary F-test p-hacking the subsequent inference model. (I am thinking about this with respect to your video on p-hacking the t-tests and applying this lesson to the ANOVA two-stage procedure in the same way.) For example if the F test didn’t reject the null, then you would conclude that all treatments were the same and you could construct a common confidence interval for the treatment means, which is a different procedure to if you reject the null when there would be several different CIs. That seems to me to be consistent logic. So the different CIs have been p-hacked unless you take into account they are conditional on the rejection of the null. I haven’t been able to find much about this in text books although there are some papers on it.
It’s true, the traditional method leaves one additional p-value that hasn’t been compensated for. You could test if this makes a difference by adding it to the pair-wise p-values that you adjust later. I suspect you can do this because regardless of the test, p-values are always normally distributed under the null hypothesis. However, I’d be willing to bet that adding one additional p-value to the adjustment step would only effect boarder-line p-values, which should be treated with suspicion from the get go…
I am probably getting a bit technical, so please excuse me. But thanks for engaging and thanks for your great videos – they are really helping me! I will have do better at explaining myself! ps. The p-values are uniformly distributed under the null, (but I’m sure you know that)
The term “model selection” is interesting and related a bit to p-hacking mentioned in another video. If the F test rejects the null hypothesis, the model that “not all treatments are the same” is selected. Subsequent inference is then conditional on that model being the true model (recall we may make a mistake alpha*100% of the time by pure chance here). So when you construct your multiple comparisons or confidence intervals, you can’t just rely on the result “the treatments are different” as an a priori given, can you.
The standard approach is to do the ANOVA up front to reject the hypothesis that there are no differences between the groups. Then you do pairwise tests between the groups. To reduce the number of false positives in your data, you apply some p-value adjustment method to the p-values that came from the pairwise tests (you don’t have to go all the way back to the original ANOVA and adjust that p-value).
Yes. But that just deals with the multiple comparisons problem. There is also a problem in the time dimension, if you will, caused by the preliminary F-test p-hacking the subsequent inference model. (I am thinking about this with respect to your video on p-hacking the t-tests and applying this lesson to the ANOVA two-stage procedure in the same way.) For example if the F test didn’t reject the null, then you would conclude that all treatments were the same and you could construct a common confidence interval for the treatment means, which is a different procedure to if you reject the null when there would be several different CIs. That seems to me to be consistent logic. So the different CIs have been p-hacked unless you take into account they are conditional on the rejection of the null. I haven’t been able to find much about this in text books although there are some papers on it.
It’s true, the traditional method leaves one additional p-value that hasn’t been compensated for. You could test if this makes a difference by adding it to the pair-wise p-values that you adjust later. I suspect you can do this because regardless of the test, p-values are always normally distributed under the null hypothesis. However, I’d be willing to bet that adding one additional p-value to the adjustment step would only effect boarder-line p-values, which should be treated with suspicion from the get go…
I am probably getting a bit technical, so please excuse me. But thanks for engaging and thanks for your great videos – they are really helping me! I will have do better at explaining myself! ps. The p-values are uniformly distributed under the null, (but I’m sure you know that)