$\renewcommand{\Re}{\operatorname{Re}}$ $\renewcommand{\Im}{\operatorname{Im}}$ $\newcommand{\erf}{\operatorname{erf}}$ $\newcommand{\dag}{\dagger}$ $\newcommand{\const}{\mathrm{const}}$ $\newcommand{\arcsinh}{\operatorname{arcsinh}}$
Definition 1. Functional is a map from some space of functions (or subset in the space of functions) $\mathsf{H}$ to $\mathbb{R}$ (or $\mathbb{C}$): \begin{equation} \Phi: \mathsf{H}\ni u\to \Phi[u]\in \mathbb{R}. \label{eq-10.1.1} \end{equation}
Remark 1. Important that we consider a whole function as an argument, not its value at some particular point!
Example 1.
Definition 2.
Definition 3. Functional $\Phi[u]$ is called linear if \begin{gather} \Phi [u+v]= \Phi[u]+\Phi [v],\label{eq-10.1.2}\\ \Phi [\lambda u]= \lambda \Phi[u] \label{eq-10.1.3} \end{gather} for all functions $u$ and scalars $\lambda$.
Remark 2. Linear functionals will be crucial in the definition of distributions later.
Exercise 1. Which functionals of Example 1. are linear?
We start from the classical variational problems: a single real valued function $q(t)$ of $t\in [t_0,t_1]$, then consider vector-valued function. This would lead us to ODEs (or their systems), and rightfully belongs to advanced ODE course.
Let us consider functional \begin{equation} S[q]= \int_{I} L(q(t),\dot{q}(t),t)\,dt \label{eq-10.1.4} \end{equation} where in traditions of Lagrangian mechanics we interpret $t\in I=[t_0,t_1]$ as a time, $q(t)$ as a coordinate, and $\dot{q}(t):=q'_t(t)$ as a velocity.
Let us consider $q+\delta q$ where $\delta q $ is a "small" function. We do not formalize this notion, just $\delta q=\varepsilon \varphi$ with fixed $\varphi$ and $\varepsilon\to 0$ is considered to be small. We call $\delta q$ variation of $q$ and important is that we change a function as a whole object. Let us consider
\begin{multline} \delta S:=S[q+\delta q]-S[q]= \int_I\Bigl(L(q+\delta q,\dot{q} + \delta \dot{q},t) -L(q,\dot{q},t)\Bigr)\,dt\\ \approx \int_I \Bigl(\frac{\partial L}{\partial q}\delta q + \frac{\partial L}{\partial \dot{q}}\delta \dot{q}\Bigr)\,dt\qquad \label{eq-10.1.5} \end{multline} where we calculated the linear part of expression in the parenthesis; if $\delta q=\varepsilon \varphi$ and all functions are sufficiently smooth then $\approx$ would mean "equal modulo $o(\varepsilon)$ as $\varepsilon\to 0$".
Definition 4.
Assumption 1. All functions are sufficiently smooth.
Under this assumption, we can integrate the right-hand expression of (\ref{eq-10.1.5}) by parts: \begin{gather} \delta S:= \int_I \Bigl(\frac{\partial L}{\partial q}\delta q + \frac{\partial L}{\partial \dot{q}}\delta \dot{q}\Bigr)\,dt = \int_I\Bigl(\frac{\partial L}{\partial q}+ \frac{d}{dt} \frac{\partial L}{\partial \dot{q}} \Bigr)\delta q \,dt + \Bigl( \frac{\partial L}{\partial \dot{q}} \Bigr)\delta q \Bigr|_{t=t_0}^{t=t_1}.\qquad \label{eq-10.1.6} \end{gather}
Definition 5. If $\delta S=0$ for all admissible variations $\delta q$ we call $q$ a stationary point or extremal of functional $S$.
Remark 3.
In this framework \begin{equation} \delta S= \int_I\Bigl(\frac{\partial L}{\partial q} - \frac{d}{dt} \frac{\partial L}{\partial \dot{q}} \Bigr)\delta q \,dt . \label{eq-10.1.8} \end{equation}
Lemma 1. Let $f$ be a continuous function in $I$. If $\int_I f(t)\varphi(t)\,dt=0$ for all $\varphi$ such that $\varphi(t_0)=\varphi(t_1)=0$ then $f=0$ in $I$.
Proof. Indeed, let us assume that $f(\bar{t})> 0$ at some point $\bar{t}\in I$ (case $f(\bar{t})< 0$ is analyzed in the same way). Then $f(t)>0$ in some vicinity $\mathcal{V}$ of $\bar{t}$. Consider function $\varphi(x)$ which is $0$ outside of $\mathcal{V}$, $\varphi\ge 0$ in $\mathcal{V}$ and $\varphi(\bar{t})>0$. Then $f(t)\varphi(t)$ has the same properties and $\int_I f(t)\varphi(t)\, dt>0$. Contradiction!
As a corollary we arrive to
Theorem 2. Let us consider a functional (\ref{eq-10.1.4}) and consider as admissible all $\delta u$ satisfying (\ref{eq-10.1.7}). Then $u$ is a stationary point of $\Phi$ if and only if it satisfies Euler-Lagrange equation \begin{equation} \frac{\delta S}{\delta q}:= \frac{\partial L}{\partial q} - \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{q}}\right) =0. \label{eq-10.1.9} \end{equation}
Remark 4.
Equation (\ref{eq-10.1.9}) is the 2nd order ODE. Indeed, \begin{gather*} \frac{d}{dt} \frac{\partial L}{\partial \dot{q}}= \left(\frac{\partial ^2L}{\partial \dot{q}\partial t} +\frac{\partial ^2L}{\partial \dot{q}\partial q}\dot{q} + \frac{\partial ^2L}{\partial \dot{q}^2}\ddot{q}\right). \end{gather*}
If $L_{q}=0$ then it is integrates to \begin{equation} \frac{\partial L}{\partial \dot{q}}=C. \label{eq-10.1.10} \end{equation}
The following equality holds: \begin{equation} \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{q}}\dot{q}-L\right)=-\frac{\partial L}{\partial t}. \label{eq-10.1.11} \end{equation} The proof will be provided for vector-valued $\mathbf{q}(t)$.
In particular, if $\frac{\partial L}{\partial t}=0$ ($L$ does not depend explicitly on $t$), then \begin{equation} \frac{d}{dt} \left(\frac{\partial L}{\partial \dot{q}}\dot{q}-L\right)=0\implies \frac{\partial L}{\partial \dot{q}}\dot{q}-L=C. \label{eq-10.1.12} \end{equation}
Definition 6. If $S[q]\ge S[q+\delta q]$ for all small admissible variations $\delta q$ we call $q$ a local maximum of functional $S$. If $S[q]\le S[q+\delta q]$ for all small admissible variations $\delta q$ we call $q$ a local minimum of functional $S$.
Here again we do not specify what is small admissible variation.
Theorem 3. If $q$ is a local extremum (that means either local minimum or maximum) of $S$ and variation exists, then $q$ is a stationary point.
Proof. Consider case of minimum. Let $\delta q =\varepsilon \varphi$. Then $S[q+\delta q]- S [q]=\varepsilon (\delta S)(\varphi) +o(\varepsilon)$. If $\pm \delta S > 0$ then choosing $\mp \varepsilon < 0$ we make $\varepsilon (\delta S)(\varphi)\le -2\sigma \varepsilon$ with some $\sigma>0$. Meanwhile for sufficiently small $\varepsilon$ "$o(\varepsilon)$" is much smaller and $S [q+\delta q]- S [q]\le -\sigma \varepsilon<0$ and $q$ is not a local minimum.
Remark 5.
Example 2. Find the line of the fastest descent from fixed point $A$ to fixed point $B$.
Solution. Assuming that the speed at $A(0,0)$ is $0$, we conclude that the speed at point $(x,y)$ is $\sqrt{2gy}$ and therefore the time of the descent from $A$ to $B(a, -h)$ \begin{gather*} T[y]=\int_0^a \frac {\sqrt{1+y'^2}\,dx}{\sqrt{2gy}}. \end{gather*}
This equation has a solution
\begin{align*}
x= &r\bigl(\varphi - \sin(\varphi)\bigr),\\
y=&r\bigl(1-\cos(\varphi))
\end{align*}
with $r=D/2$.
So, brachistochrone is an inverted cycloid-the curve traced by a point on a circle as it rolls along a straight line without slipping.