Let $\mathcal{H}$ be a Hilbert space with inner product $\langle \cdot, \cdot \rangle$ and associated norm $\\|\cdot\\|$. We want to solve the following minimisation problem
$
\min_{\theta} f(r(\theta)), \quad f(r(\theta)) = \frac{1}{2}\|r(\theta)\|^2
$
Here, $\theta\in\mathbb{R}^p$ is a parameter set and $r(\theta)$ is a possibly nonlinear map $r:\theta\rightarrow\mathcal{H}$. $f$ is a nonlinear map from $\mathcal{H}$ into $\mathbb{R}$.
## Functional derivative of $f$
We can compute the functional derivative of $f$ as follows. Let $v\in\mathcal{H}$. Then
$
f(r + \epsilon v) = f(r)+ \epsilon \langle r, v\rangle + \epsilon^2 \langle v, v\rangle.
$
It follows that the functional derivative $Df(r)[v]$ is given as
$
Df(r)[v] = \langle r, v\rangle.
$
## Necessary condition for a minimum
The necessary condition for a minimum of $f(r(\theta))$ is that all partial derivatives with respect to the individual parameters $\theta\_i$ are zero. Hence,
$
\partial_{\theta_i} f(r(\theta)) = \langle r(\theta), \partial_{\theta_i} r(\theta)\rangle = 0,\quad i=1,\dots, p.
$
## The Gauss-Newton algorithm
Let $\theta^{(k)}$ be our current approximation for a minimiser and $\hat{\theta}$ be the exact minimizer. We define* $\Delta := \theta^{(k)} - \hat{\theta}$. *The Gauss-Newton method uses the following approximation to compute $\Delta^{(k)}\approx \Delta$ and from this the next iterate $\theta^{(k+1)} = \theta^{(k)} - \Delta^{(k)}$.
$
\begin{align}
0 &= \langle r(\hat{\theta}), \partial_{\theta_i} r(\hat{\theta})\rangle\nonumber\\
&\approx \langle r(\theta^{(k)} - \Delta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle\nonumber\\
&\approx \langle r(\theta^{(k)}) - \sum_{j=1}^p \Delta_j^{(k)}\partial_{\theta_j}r(\theta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle\nonumber\\
&= \langle r(\theta^{(k)}), \partial_{\theta_i} r(\theta^{(k)})\rangle - \sum_{j=1}^p \Delta_j^{(k)} \langle \partial_{\theta_j}r(\theta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle, \quad i=1,\dots, p.\nonumber
\end{align}
$
We see that there are two essential approximations. In the second line we use the current derivative $\partial_{\theta_i} r(\theta^{(k)})$ instead of the derivative at the exact minimiser $\hat{\theta}$ as we would in a standard Newton method. In the next line we then proceed as in a normal Newton method and replace $r(\theta^{(k)} - \Delta^{(k)})$ by its linearisation. The approximation in the second line ensures that we do need to incorporate the second derivative of $r$. But it also means that we cannot expect quadratic convergence as in Newton.
The equation
$
0 = \langle r(\theta^{(k)}), \partial_{\theta_i} r(\theta^{(k)})\rangle - \sum_{j=1}^p \Delta_j^{(k)} \langle \partial_{\theta_j}r(\theta^{(k)}), \partial_{\theta_i}r(\theta^{(k)})\rangle, i=1,\dots, p
$
is the normal equation associated with the linear least-squares problem
$
\min_{\Delta}\|\begin{bmatrix}\partial_{\theta_1} r(\theta^{(k)}),\dots, \partial_{\theta_p}r(\theta^{(k)})\end{bmatrix}\Delta^{(k)} - r(\theta^{(k)})\|.
$
Here, the matrix $\begin{bmatrix}\partial_{\theta_1} r(\theta^{(k)}),\dots, \partial_{\theta_p}r(\theta^{(k)})\end{bmatrix}$ needs to be understood in the sense of a quasi-matrix in which the $j$th column is not a vector but an element of the Hilbert space $\mathcal{H}$. Hence, Gauss-Newton iteratively solves linear least-squares problems in the tangent plane $\mathcal{T}M$, where $M$ is the manifold $\{r(\theta): \theta\in\mathbb{R}^p\}\subset\mathcal{H}$.
The normal equation above has a small but interesting simplification, namely
$
\langle r(\theta^{(k)}), \partial_{\theta_i} r(\theta^{(k)})\rangle = \partial_{\theta_i}f(r(\theta^{(k)})).
$
Hence, if we write the normal equation as a linear system of the form
$
G(\theta^{(k)})\Delta^{(k)} = \eta^{(k)}
$
with $[G(\theta^{(k)})]_{i, j} = \langle \partial_{\theta_i} r(\theta^{(k)}), \partial_{\theta_j} r(\theta^{(k)})\rangle$ then $\eta^{(k)} = [\partial_{\theta_1}f(r(\theta^{(k)})),\dots,\partial_{\theta_p} f(r(\theta^{(k)}))]^T$ is just the parametric derivative of $f$ with respect to the parameter set $\theta$.