An epic journey through statistics and machine learning
Regularization Part 2: Lasso Regression
2 thoughts on “Regularization Part 2: Lasso Regression”
Hi Josh,
Thank you so much for the videos! I’ve been working on a data analysis course and ridge regression came up and your videos are a godsend.
Could you clarify why increasing lambda decreases the slope? My understanding is that in order to decrease the amount of penalty, we can only decrease the slope of the regression line.
However, I’m also confused by how the multivariate regressions shrink. And why can’t ridge regression’s slope never equal zero while lass regression’s slope can?
If your parameter is 2, and lambda = 1, then the lasso penalty = 2. If lambda = 2, then the lasso penalty = 4 and if lambda = 3, then the lasso penalty = 6. So the more we increase lambda, the more the penalty is. To compensate for this, we can decrease the parameter value. This may increase the sum of the squared residuals, but perhaps not as much as the lasso penalty.
The Introduction to Statistical Learning (free download, just google it) has a discussion of why lasso can shrink parameters to 0 and ridge can not. Intuitively, once the parameter values get below zero, the ridge penalty, by squaring the parameters, makes them even smaller and thus, there is less need to shrink them. Thus, they don’t go all the way to zero. In contrast, the lasso penalty leaves the parameters as is.
Hi Josh,
Thank you so much for the videos! I’ve been working on a data analysis course and ridge regression came up and your videos are a godsend.
Could you clarify why increasing lambda decreases the slope? My understanding is that in order to decrease the amount of penalty, we can only decrease the slope of the regression line.
However, I’m also confused by how the multivariate regressions shrink. And why can’t ridge regression’s slope never equal zero while lass regression’s slope can?
Thank you!
If your parameter is 2, and lambda = 1, then the lasso penalty = 2. If lambda = 2, then the lasso penalty = 4 and if lambda = 3, then the lasso penalty = 6. So the more we increase lambda, the more the penalty is. To compensate for this, we can decrease the parameter value. This may increase the sum of the squared residuals, but perhaps not as much as the lasso penalty.
The Introduction to Statistical Learning (free download, just google it) has a discussion of why lasso can shrink parameters to 0 and ridge can not. Intuitively, once the parameter values get below zero, the ridge penalty, by squaring the parameters, makes them even smaller and thus, there is less need to shrink them. Thus, they don’t go all the way to zero. In contrast, the lasso penalty leaves the parameters as is.