Scaled Descent
CPSC 406 – Computational Optimization
Scaled descent
- conditioning
- scaled gradient direction
- Gauss Newton
Zig-zagging
Consider the quadratic function with symmetric and positive definite
level sets are ellipsoids: 
gradient descent from two starting points: 
- eigenvectors of are principal axes
- eigenvalues are the lengths of the “unit ellipse” axes
Gradient descent zig-zags
Let be the iterates generated by gradient descent with exact linesearch. Then
Proof: exact steplength satisfies
Condition number
The condition number of an positive definite matrix is
- ill-conditioned if
- condition number of Hessian influences speed of convergence of gradient descent
- : gradient descent converges in one step
- : gradient descent zig-zags
- if is twice continuously differentiable, define the condition number of at solution as
Scaled gradient method
make a linear change of variables: where is nonsingular to get rescaled problem
apply gradient descent to scaled problem
multiply on left by to get -update
Scaled descent
If , the scaled negative gradient is a descent direction because
Recall: a matrix is positive definite if and only if
- with diagonal and nonsingular
- with nonsingular
scaled gradient method
- for
- choose scaling matrix
- compute
- choose stepsize via linesearch on
- update
Choosing the scaling matrix
Observe relationship between optimizing and optimizing its scaling
condition number of governs convergence of gradient descent
- choose such that is well-conditioned, ie,
Example (quadratic)
- pick such that , ie,
- gives perfectly conditioned
Level sets of scaled and unscaled problems
Close to solution , levels sets of
are ellipsoids and ![]()
are circles for ideal because ![]()
Question
Consider the change of variables to the quadratic function
to obtain the scaled function
Which choice of the nonsingular scaling matrix will transform the level sets of into circles (i.e., result in a perfectly conditioned Hessian for )?
- (the identity matrix)
- (the diagonal part of )
Common scalings
Make as well conditioned as possible
Gauss Newton
Nonlinear Least Squares
Nonlinear least squares
- NLLS (nonlinear least-squares) problem
- gradient and residual vector (Jacobian )
- reduces to linear least-squares when is affine
Example – localication problem
- estimate from approximate distances to known fixed beacons
data
- beacons at known locations
- approximate distances where is measurement error
- NLLS position estimate solves
- must settle for locally optimal solution
Linearization of residual
- pure Gauss Newton iteration: use linearized least-squares problem used to determine
Gauss Newton as scaled descent
- expand the least squares subproblem (set and ). If full rank,
- interpret at scaled gradient descent
Gauss Newton for NLLS
- linesearch on nonlinear objective required to ensure convergence
Gauss Newton for NLLS
- given starting point and stopping tolerance
- for
- compute residual and Jacobian
- compute step , ie,
- choose stepsize via linesearch on
- update
- stop if or