An MLP is a simple realisation of a deep neural network. It can be written as a function
$
f_{\theta}(x) = (\sigma_0\circ C_{L}\circ\sigma\circ\dots\circ C_0)(x)
$
mapping $\mathbb{R}^{n_0}\rightarrow \mathbb{R}^{n_{L+1}}$. Here $\sigma$ is an activation function, $\sigma_0$ is an output function that is typically chosen as the identity. The function $C_k$ in each layer is defined through
$
C_k:\mathbb{R}^{n_k}\rightarrow \mathbb{R}^{n_{k+1}}: x\rightarrow W_kx + b_k,
$
with $W_k\in\mathbb{R}^{n\_{k+1}\times n_k}$ and $b_k\in\mathbb{R}^{n\_{k+1}}$. The dimension $n_k$ of each layer is also called the width of the layer. The matrices $W_k$ are the weights, and the vectors $b_k$ are the biases. The total trainable parameter set $\theta$ is the collection of all weights and biases $(W_0, b_0), (W_1, b_1), \dots, (W_{L},b_{L})$ with a total number of parameters $P$ given by
$
P = \sum_{k=0}^L (n_{k} +1)n_{k+1}.
$
In the definition above the number $L$ corresponds to the number of hidden layers. For $L=1$ the neural network has one hidden layer.
## References
[1]
T. D. Ryck and S. Mishra, ‘Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning’, _Acta Numerica_, vol. 33, pp. 633–713, 2024, doi: [10.1017/s0962492923000089](https://doi.org/10.1017/s0962492923000089).