$\renewcommand{\Re}{\operatorname{Re}}$ $\renewcommand{\Im}{\operatorname{Im}}$ $\newcommand{\erf}{\operatorname{erf}}$ $\newcommand{\dag}{\dagger}$ $\newcommand{\const}{\mathrm{const}}$ $\newcommand{\arcsinh}{\operatorname{arcsinh}}$
Definition 1. Functional is a map from some space of functions (or subset in the space of functions) $\mathsf{H}$ to $\mathbb{R}$ (or $\mathbb{C}$): \begin{equation} \Phi: \mathsf{H}\ni u\to \Phi[u]\in \mathbb{R}. \label{eq-10.1.1} \end{equation}
Remark 1. Important that we consider a whole function as an argument, not its value at some particular point!
Example 1.
Definition 2.
Definition 3. Functional $\Phi[u]$ is called linear if \begin{gather} \Phi [u+v]= \Phi[u]+\Phi [v],\label{eq-10.1.2}\\ \Phi [\lambda u]= \lambda \Phi[u] \label{eq-10.1.3} \end{gather} for all functions $u$ and scalars $\lambda$.
Remark 2. Linear functionals will be crucial in the definition of distributions later.
Exercise 1. Which functionals of Example 1. are linear?
We start from the classical variational problems: a single real valued function $q(t)$ of $t\in [t_0,t_1]$, then consider vector-valued function. This would lead us to ODEs (or their systems), and rightfully belongs to advanced ODE course.
Let us consider functional \begin{equation} S[q]= \int_{I} L(q(t),\dot{q}(t),t)\,dt \label{eq-10.1.4} \end{equation} where in traditions of Lagrangian mechanics we interpret $t\in I=[t_0,t_1]$ as a time, $q(t)$ as a coordinate, and $\dot{q}(t):=q'_t(t)$ as a velocity.
Let us consider $q+\delta q$ where $\delta q $ is a small
function. We do not formalize this notion, just $\delta q=\varepsilon \varphi$ with fixed $\varphi$ and $\varepsilon\to 0$ is considered to be small. We call $\delta q$ variation of $q$ and important is that we change a function as a whole object. Let us consider
\begin{multline}
\delta S:=S[q+\delta q]-S[q]=
\int_I\Bigl(L(q+\delta q,\dot{q} + \delta \dot{q},t)
-L(q,\dot{q},t)\Bigr)\,dt\\
\approx \int_I \Bigl(\frac{\partial L}{\partial q}\delta q + \frac{\partial L}{\partial \dot{q}}\delta \dot{q}\Bigr)\,dt\qquad
\label{eq-10.1.5}
\end{multline}
where we calculated the linear part of expression in the parenthesis; if $\delta u=\varepsilon \varphi$ and all functions are sufficiently smooth then $\approx$ would mean equal modulo $o(\varepsilon)$ as $\varepsilon\to 0$
.
Definition 4.
Assumption 1. All functions are sufficiently smooth.
Under this assumption, we can integrate the right-hand expression of (\ref{eq-10.1.5}) by parts: \begin{multline} \delta S:= \int_I \Bigl(\frac{\partial L}{\partial q}\delta q + \frac{\partial L}{\partial \dot{q}}\delta \dot{q}\Bigr)\,dt\\ = \int_I\Bigl(\frac{\partial L}{\partial q} - \frac{d}{dt} \frac{\partial L}{\partial \dot{q}} u\Bigr)\delta u \,dt - \Bigl( \frac{\partial L}{\partial \dot{q}} \Bigr)\delta q \Bigr|_{t=t_0}^{t=t_1},\qquad \label{eq-10.1.6} \end{multline}
Definition 5. If $\delta S=0$ for all admissible variations $\delta q$ we call $q$ a stationary point or extremal of functional $S$.
Remark 3.
In this framework \begin{equation} \delta S= \int_I\Bigl(\frac{\partial L}{\partial q} - \frac{d}{dt} \frac{\partial L}{\partial \dot{q}} \Bigr)\delta q \,dt . \label{eq-10.1.8} \end{equation}
Lemma 1. Let $f$ be a continuos function in $I$. If $\int_I f(t)\varphi(t)\,dt=0$ for all $\varphi$ such that $\varphi(t_0)=\varphi(t_1)=0$ then $f=0$ in $I$.
Proof. Indeed, let us assume that $f(\bar{t})> 0$ at some point $\bar{t}\in I$ (case $f(\bar{t})< 0$ is analyzed in the same way). Then $f(t)>0$ in some vicinity $\mathcal{V}$ of $\bar{t}$. Consider function $\varphi(x)$ which is $0$ outside of $\mathcal{V}$, $\varphi\ge 0$ in $\mathcal{V}$ and $\varphi(\bar{t})>0$. Then $f(t)\varphi(t)$ has the same properties and $\int_I f(t)\varphi(t)\, dt>0$. Contradiction!
As a corollary we arrive to
Theorem 1. Let us consider a functional (\ref{eq-10.1.4}) and consider as admissible all $\delta u$ satisfying (\ref{eq-10.1.7}). Then $u$ is a stationary point of $\Phi$ if and only if it satisfies Euler-Lagrange equation \begin{equation} \frac{\delta S}{\delta q}:= \frac{\partial L}{\partial q} - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{q}}\right) =0. \label{eq-10.1.9} \end{equation}
Remark 4.
Definition 6. If $S[q]\ge S[q+\delta q]$ for all small admissible variations $\delta q$ we call $u$ a local maximum of functional $S$. If $S[q]\le S[q+\delta q]$ for all small admissible variations $\delta q$ we call $q$ a local minimum of functional $S$.
Here again we do not specify what is small admissible variation.
Theorem 2. If $q$ is a local extremum (that means either local minimum or maximum) of $S$ and variation exits, then $q$ is a stationary point.
Proof.
Consider case of minimum. Let $\delta q =\varepsilon \varphi$. Then $S[q+\delta q]- S [q]=\varepsilon (\delta S)(\varphi) +o(\varepsilon)$. If $\pm \delta S > 0$ then choosing
$\mp \varepsilon < 0$ we make
$\varepsilon (\delta S)(\varphi)\le -2\sigma \varepsilon$ with some $\sigma>0$. Meanwhile for sufficiently small $\varepsilon$ $o(\varepsilon)$
is much smaller and $S [q+\delta q]- S [q]\le -2\sigma \varepsilon<0$ and $q$ is not a local minimum.
Remark 5. We consider neither sufficient conditions of extremums nor second variations (similar to second differentials). In some cases they will be obvious.