Let $\mathcal{H}$ be a Hilbert space with inner product $\langle \cdot, \cdot \rangle$ and associated norm $\\|\cdot\\|$. We want to solve the following minimisation problem $ \min_{\theta} f(r(\theta)), \quad f(r(\theta)) = \frac{1}{2}\|r(\theta)\|^2 $ Here, $\theta\in\mathbb{R}^p$ is a parameter set and $r(\theta)$ is a possibly nonlinear map $r:\theta\rightarrow\mathcal{H}$. $f$ is a nonlinear map from $\mathcal{H}$ into $\mathbb{R}$. ## Functional derivative of $f$ We can compute the functional derivative of $f$ as follows. Let $v\in\mathcal{H}$. Then $ f(r + \epsilon v) = f(r)+ \epsilon \langle r, v\rangle + \epsilon^2 \langle v, v\rangle. $ It follows that the functional derivative $Df(r)[v]$ is given as $ Df(r)[v] = \langle r, v\rangle. $ ## Necessary condition for a minimum The necessary condition for a minimum of $f(r(\theta))$ is that all partial derivatives with respect to the individual parameters $\theta\_i$ are zero. Hence, $ \partial_{\theta_i} f(r(\theta)) = \langle r(\theta), \partial_{\theta_i} r(\theta)\rangle = 0,\quad i=1,\dots, p. $ ## The Gauss-Newton algorithm Let $\theta^{(k)}$ be our current approximation for a minimiser and $\hat{\theta}$ be the exact minimizer. We define* $\Delta := \theta^{(k)} - \hat{\theta}$. *The Gauss-Newton method uses the following approximation to compute $\Delta^{(k)}\approx \Delta$ and from this the next iterate $\theta^{(k+1)} = \theta^{(k)} - \Delta^{(k)}$. $ \begin{align} 0 &= \langle r(\hat{\theta}), \partial_{\theta_i} r(\hat{\theta})\rangle\nonumber\\ &\approx \langle r(\theta^{(k)} - \Delta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle\nonumber\\ &\approx \langle r(\theta^{(k)}) - \sum_{j=1}^p \Delta_j^{(k)}\partial_{\theta_j}r(\theta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle\nonumber\\ &= \langle r(\theta^{(k)}), \partial_{\theta_i} r(\theta^{(k)})\rangle - \sum_{j=1}^p \Delta_j^{(k)} \langle \partial_{\theta_j}r(\theta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle, \quad i=1,\dots, p.\nonumber \end{align} $ We see that there are two essential approximations. In the second line we use the current derivative $\partial_{\theta_i} r(\theta^{(k)})$ instead of the derivative at the exact minimiser $\hat{\theta}$ as we would in a standard Newton method. In the next line we then proceed as in a normal Newton method and replace $r(\theta^{(k)} - \Delta^{(k)})$ by its linearisation. The approximation in the second line ensures that we do need to incorporate the second derivative of $r$. But it also means that we cannot expect quadratic convergence as in Newton. The equation $ 0 = \langle r(\theta^{(k)}), \partial_{\theta_i} r(\theta^{(k)})\rangle - \sum_{j=1}^p \Delta_j^{(k)} \langle \partial_{\theta_j}r(\theta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle, i=1,\dots, p $ is the normal equation associated with the linear least-squares problem $ \min_{\Delta}\|\begin{bmatrix}\partial_{\theta_1} r(\theta^{(k)}),\dots, \partial_{\theta_p}r(\theta^{(k)})\end{bmatrix}\Delta^{(k)} - r(\theta^{(k)})\|. $ Here, the matrix $\begin{bmatrix}\partial_{\theta_1} r(\theta^{(k)}),\dots, \partial_{\theta_p}r(\theta^{(k)})\end{bmatrix}$ needs to be understood in the sense of a quasi-matrix in which the $j$th column is not a vector but an element of the Hilbert space $\mathcal{H}$. Hence, Gauss-Newton iteratively solves linear least-squares problems in the tangent plane $\mathcal{T}M$, where $M$ is the manifold $\{r(\theta): \theta\in\mathbb{R}^p\}\subset\mathcal{H}$. The normal equation above has a small but interesting simplification, namely $ \langle r(\theta^{(k)}), \partial_{\theta_i} r(\theta^{(k)})\rangle = \partial_{\theta_i}f(r(\theta^{(k)})). $ Hence, if we write the normal equation as a linear system of the form $ G(\theta^{(k)})\Delta^{(k)} = \eta^{(k)} $ with $[G(\theta^{(k)})]_{i, j} = \langle \partial_{\theta_i} r(\theta^{(k)}), \partial_{\theta_j} r(\theta^{(k)})\rangle$ then $\eta^{(k)} = [\partial_{\theta_1}f(r(\theta^{(k)})),\dots,\partial_{\theta_p} f(r(\theta^{(k)}))]^T$ is just the parametric derivative of $f$ with respect to the parameter set $\theta$.