Clarifications for HW 1
Post date: Oct 11, 2013 8:37:53 AM
If you have trouble picturing what the first exercises are about, try coding them first for a simple example, e.g.:
let f(X) = 2x1 + 3x2^2
then the function form exercise 1 should compute the gradient of f at a given point:
grad(f, [1, 2]) = [2, 12]
by approximating df/dx1 = (f([x1+eps, x2]) - f([x1 - eps, x2]))/(2eps) = ...
note that the formula has to be applied separately for all elements of the input vector X (in other words you have to compute in a loop the partial derivatives).
The function from the second exercise should check whether your gradients are good. Continuing the example you decide to minimize f and provide its gradient:
fg(X) = f(X), gradf(X) = 2x1 + 3x2^2, [2, 6x2]
note that here the function returns a tuple: the value and gradient at a certain point.
Now the goal is to verify that the formula for the gradient results in approximately the same thing as the numerical approximation.
Finally, batch gradient descent is just this algorithm:
1. set theta = theta0 //can set theta=[0,0,..,0] in this example
2. while not converged:
3. theta = theta - alpha*grad(f,theta)
You will have to find an alpha that works (something small, like 10^-4, 10^-7) and find when to stop, possibilities include:
- the value doesn't change too much
- number of iterations exceeded
- the gradient is close to 0
It usually helps to plot how the value of the function to be minimized is changing over the iterations.