# Regularization Part 1: Ridge Regression

## 6 thoughts on “Regularization Part 1: Ridge Regression”

1. Klaus

Hello Josh,

Thank you for your great videos!
One question: In your example, the data outliers (red dots) are arranged so that the slope of the red line is higher than the slope of the green data regression line (higher values of y for high values of x and smaller values of y for small values of x).
What if the slope of the red line is smaller than the slope of the green line (smaller values of y for high values of x and higher values of y for small values of x)? How does Ridge Regression work in this scenario?

• Josh

If ridge regression can not improve predictions by shrinking parameters, then it will do nothing at all.

2. Matthew Samelson

I understand cross validation but I don’t understand how you would use cross validation to find the best lambda. Do you simply plug different values for lambda holding slope^2 constant and take the lambda value that returns the smallest SSE from the cross validation? Thanks in advance.

• starmer

For CV, you input various values for lambda, like 0, 0.1, 1 and 10. Then, for each candidate value for lambda, you find the the slope that has the minimum SSR + Ridge Regression Penalty for the training data. Then we see how good that slope predicts the values in the testing data. We then pick the value for lambda that performs best with the testing data.

3. Alwin

I have a question of the video Part 1 Ridge Regression. The formula of the ridge regression line is Size = 0.9 + 0.8 * Weight. How did you determine this formula? The penalty is 0.74 for the ridge regression line?

• starmer

The optimal parameters for ridge regression are determined using an iterative procedure like Gradient Descent. For details on Gradient Descent, see: https://youtu.be/sDv4f4s2SB8