# StatQuest: Linear Regression (aka GLMs, part 1)

Linear regression is the first part in a bunch of videos I’m going to do about General Linear Models.

I also made a companion StatQuest that shows how to do linear regression in R:

Here’s the code from the video if you want to try it out yourself:

## Here's the data from the example: mouse.data <- data.frame( weight=c(0.9, 1.8, 2.4, 3.5, 3.9, 4.4, 5.1, 5.6, 6.3), size=c(1.4, 2.6, 1.0, 3.7, 5.5, 3.2, 3.0, 4.9, 6.3)) mouse.data # print the data to the screen in a nice format ## plot a x/y scatter plot with the data plot(mouse.data$weight, mouse.data$size) ## create a "linear model" - that is, do the regression mouse.regression <- lm(size ~ weight, data=mouse.data) ## generate a summary of the regression summary(mouse.regression) ## add the regression line to our x/y scatter plot abline(mouse.regression, col="blue")

# StatQuest: K-means clustering

## demo of k-means clustering... ## Step 1: make up some data x <- rbind( matrix(rnorm(100, mean=0, sd = 0.3), ncol = 2), # cluster 1 matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2), # cluster 2 matrix(c(rnorm(50, mean = 1, sd = 0.3), # cluster 3 rnorm(50, mean = 0, sd = 0.3)), ncol = 2)) colnames(x) <- c("x", "y") ## Step 2: show the data without clustering plot(x) ## Step 3: show the data with the known clusters (this is just so we ## can see how well k-means clustering recreates the original clusters we ## created in step 1) colors <- as.factor(c( rep("c1", times=50), rep("c2", times=50), rep("c3", times=50))) plot(x, col=colors) ## Step 3: cluster the data ## NOTE: nstart=25, so kmeans() will cluster using 25 different starting points ## and return the best cluster. (cl <- kmeans(x, centers=3, nstart=25)) ## Step 4: plot the data, coloring the points with the clusters plot(x, col = cl$cluster)