UBC CPSC 406 2022-T2

Nonlinear Least-Squares

The nonlinear least-squares problem generalizes the linear least-squares problem to include nonlinear residual functions. The Gauss-Newton solves this problem as a sequence of least-squares subproblems.

Nonlinear residuals

The nonlinear least squares problem seeks to minimize the sum of squares

minxRn 12r(x)2, \min_{x\in\mathbb R^n}\ \tfrac12\|r(x)\|^2,

where the mm-vector of residuals

r(x)=[r1(x)rm(x)] r(x)=\begin{bmatrix} r_1(x)\\\vdots\\r_m(x)\end{bmatrix}

is composed of differentiable nonlinear functions ri:RnRr_{i}:\mathbb R^n→\mathbb R. The non-linear least squares problem reduces to linear least-squares when rr is affine, i.e., r(x)=Axbr(x) = Ax-b for some mm-by-nn matrix AA and mm-vector of observations bb.

Linearizing the residual

We can solve non-linear least squares problem (1) by solving a sequence of linear least-squares problem, which result from linearization of rr at the current estimate of the ground truth xx.

The linear approximation of rr at a point xkRnx^k \in \mathbb R^n is defined by

rk(d):=[r1(xk)+r1(xk)T ⁣drm(xk)+rm(xk)T ⁣d]=r(xk)+J(xk)d\begin{aligned} r^k(d) &:= \begin{bmatrix} r_1(x^k)+\nabla r_1(x^k)^T\! d \\ \vdots \\ r_m(x^k)+\nabla r_m(x^k)^T\! d\end{bmatrix} \\ &= r(x^k) + J(x^k)d \end{aligned}

where

J(x)=[r1(x)Trm(x)T] J(x) = \begin{bmatrix} \nabla r_1(x)^T\\ \vdots \\ \nabla r_m(x)^T\end{bmatrix}

is the Jacobian of rr evaluated at xx

Algorithm: (Gauss-Newton method)
Choose a starting x(0)x^{(0)}
for k=0,1,2,k=0,1,2,\ldots

  1. linearize rr at xkx^{k} to define rk(d):=r(xk)+J(xk)dr^k(d):=r(x^k)+J(x^k)d

  2. solve linear least-squares problem

    dk=arg mind 12rk(d)2 d^k = \argmin_d\ \tfrac12\| r^k(d)\|^2

  3. update iterate: xk+1=xk+αdkx^{k+1} = x^k + αd^k for some α(0,1]α∈(0,1]

If the linesearch parameter α\alpha is held fixed at 1, then this is a "pure" Gauss-Newton iteration. We'll later discuss options for when it's advisable to use smaller steps.

Here's a basic version of the method that takes as arguments the residual and Jacobian functions r and J, and a starting vector x.

function gauss_newton(r, J, x; ε=1e-4, maxits=30)
  err = []
  for i = 1:maxits
      rk, Jk = r(x), J(x)
      push!(err, norm(Jk'rk))
      err[end] ≤ ε && break
      x = x - Jk\rk 
    end
    return x, err
end

The condition err[end] ≤ ε causes the iterations to terminate when the latest residual rk is nearly orthogonal to all the gradients of the residual, i.e., the residual is orthogonal to the tangent space of the level-set for the nonlinear least-squares objective 12r(x)2\tfrac12\|r(x)\|^2. This is exactly analogous to the optimality condition for linear least-squares.

Example: Position estimation from ranges

Let xR2x \in \mathbb R^2 represent the unknown position of an object. Our aim is to find the position of this object relative to a set of mm beacons placed in fixed known positions biR2b_{i} \in \mathbb R^2, i=1,,mi = 1,\dots,m. The only data available are range measurements δiδ_i that give an estimate of the distance between each beacon bib_{i} and the object at xx, i.e.,

δi:=xbi+νi δ_i := \| x-b_i\| + ν_i

where each scalar νiν_i represents the error between the true (and unkown) distance xbi\| x-b_i\| and the measurement δiδ_i.

We can obtain an estimate of the object's position xx by solving the nonlinear least-squares problem (1) where we define the iith residual between the reported distance δiδ_i and the distance between the position bib_i of the iith beacon and xx:

ri(x):=δixbi. r_i(x) := δ_{i} - \|x-b_i\|.

Here's the residual function, which takes a vector of ranges δδ and a vector of positions b:

r(x, δ, b) = [ δ[i] - norm(x - b[:,i]) for i in 1:length(δ) ]

Use the following Julia packages for this example.

using Random
using Statistics
using LinearAlgebra
using ForwardDiff
using Plots

We simulate data by placing m beacons in random locations:

m = 13                 # number of beacons
b = 2rand(2, m) .- 1   # place beacons uniformly in the unit box 
x = zeros(2)           # true position 
x0 = 2rand(2) .- 1     # initial guess of the unknown position
ν = .5*rand(m)
δ = [ norm(x - b[:,i]) + ν[i] for i in 1:m]

Place these items on a map (we'll wrap this in a function so that we can reuse it below):

function plotmap(b, x, x0)
scatter(xlim=(-1,+1), ylim=(-1,+1), leg=:outertopright,frame=:box, aspect_ratio=:equal)
scatter!(b[1,:], b[2,:], label="beacons", shape=:square, ms=7)
scatter!(x[1,:], x[2,:], label="true position", shape=:xcross, ms=7)
scatter!(x0[1,:], x0[2,:], label="initial guess", c="yellow", ms=7)
end
plotmap(b, x, x0)
J(x) = ForwardDiff.jacobian(x->r(x, δ, b), x)
xs, err = gauss_newton(x->r(x, δ, b), J, x0)
plot(err, yaxis=:log)

Plot the original map and overlay the obtained solution:

plotmap(b, x, x0)
scatter!(xs[1,:], xs[2,:], label="solution", shape=:star7, c="green", ms=7)