Notes on Convexity from Epigraphical Operations

$$ \newcommand{\bzero}{\mathbf{0}} \newcommand{\ba}{\mathbf{a}}\newcommand{\bA}{\mathbf{A}} \newcommand{\bb}{\mathbf{b}}\newcommand{\bB}{\mathbf{B}} \newcommand{\bc}{\mathbf{c}}\newcommand{\bC}{\mathbf{C}} \newcommand{\bd}{\mathbf{d}}\newcommand{\bD}{\mathbf{D}} \newcommand{\be}{\mathbf{e}}\newcommand{\bE}{\mathbf{E}} \newcommand{\bff}{\mathbf{f}}\newcommand{\bF}{\mathbf{F}} \newcommand{\bg}{\mathbf{g}}\newcommand{\bG}{\mathbf{G}} \newcommand{\bh}{\mathbf{h}}\newcommand{\bH}{\mathbf{H}} \newcommand{\bi}{\mathbf{i}}\newcommand{\bI}{\mathbf{I}} \newcommand{\bj}{\mathbf{j}}\newcommand{\bJ}{\mathbf{J}} \newcommand{\bk}{\mathbf{k}}\newcommand{\bK}{\mathbf{K}} \newcommand{\bl}{\mathbf{l}}\newcommand{\bL}{\mathbf{L}} \newcommand{\bm}{\mathbf{m}}\newcommand{\bM}{\mathbf{M}} \newcommand{\bn}{\mathbf{n}}\newcommand{\bN}{\mathbf{N}} \newcommand{\bo}{\mathbf{o}}\newcommand{\bO}{\mathbf{O}} \newcommand{\bp}{\mathbf{p}}\newcommand{\bP}{\mathbf{P}} \newcommand{\bq}{\mathbf{q}}\newcommand{\bQ}{\mathbf{Q}} \newcommand{\br}{\mathbf{r}}\newcommand{\bR}{\mathbf{R}} \newcommand{\bs}{\mathbf{s}}\newcommand{\bS}{\mathbf{S}} \newcommand{\bt}{\mathbf{t}}\newcommand{\bT}{\mathbf{T}} \newcommand{\bu}{\mathbf{u}}\newcommand{\bU}{\mathbf{U}} \newcommand{\bv}{\mathbf{v}}\newcommand{\bV}{\mathbf{V}} \newcommand{\bw}{\mathbf{w}}\newcommand{\bW}{\mathbf{W}} \newcommand{\indep}{\perp \!\!\! \perp} \newcommand{\bx}{\mathbf{x}}\newcommand{\bX}{\mathbf{X}} \newcommand{\by}{\mathbf{y}}\newcommand{\bY}{\mathbf{Y}} \newcommand{\bz}{\mathbf{z}}\newcommand{\bZ}{\mathbf{Z}} \newcommand{\balpha}{\boldsymbol{\alpha}}\newcommand{\bAlpha}{\boldsymbol{\Alpha}} \newcommand{\bbeta}{\boldsymbol{\beta}}\newcommand{\bBeta}{\boldsymbol{\Beta}} \newcommand{\bgamma}{\boldsymbol{\gamma}}\newcommand{\bGamma}{\boldsymbol{\Gamma}} \newcommand{\bdelta}{\boldsymbol{\delta}}\newcommand{\bDelta}{\boldsymbol{\Delta}} \newcommand{\bepsilon}{\boldsymbol{\epsilon}}\newcommand{\bEpsilon}{\boldsymbol{\Epsilon}} \newcommand{\bzeta}{\boldsymbol{\zeta}}\newcommand{\bZeta}{\boldsymbol{\Zeta}} \newcommand{\beeta}{\boldsymbol{\eta}}\newcommand{\bEta}{\boldsymbol{\Eta}} % \beta already taken \newcommand{\btheta}{\boldsymbol{\theta}}\newcommand{\bTheta}{\boldsymbol{\Theta}} \newcommand{\biota}{\boldsymbol{\iota}}\newcommand{\bIota}{\boldsymbol{\Iota}} \newcommand{\bkappa}{\boldsymbol{\kappa}}\newcommand{\bKappa}{\boldsymbol{\Kappa}} \newcommand{\blambda}{\boldsymbol{\lambda}}\newcommand{\bLambda}{\boldsymbol{\Lambda}} \newcommand{\bmu}{\boldsymbol{\mu}}\newcommand{\bMu}{\boldsymbol{\Mu}} \newcommand{\bnu}{\boldsymbol{\nu}}\newcommand{\bNu}{\boldsymbol{\Nu}} \newcommand{\bxi}{\boldsymbol{\xi}}\newcommand{\bXi}{\boldsymbol{\Xi}} \newcommand{\bomikron}{\boldsymbol{\omikron}}\newcommand{\bOmikron}{\boldsymbol{\Omikron}} \newcommand{\bpi}{\boldsymbol{\pi}}\newcommand{\bPi}{\boldsymbol{\Pi}} \newcommand{\brho}{\boldsymbol{\rho}}\newcommand{\bRho}{\boldsymbol{\Rho}} \newcommand{\bsigma}{\boldsymbol{\sigma}}\newcommand{\bSigma}{\boldsymbol{\Sigma}} \newcommand{\btau}{\boldsymbol{\tau}}\newcommand{\bTau}{\boldsymbol{\Tau}} \newcommand{\bypsilon}{\boldsymbol{\ypsilon}}\newcommand{\bYpsilon}{\boldsymbol{\Ypsilon}} \newcommand{\bphi}{\boldsymbol{\phi}}\newcommand{\bPhi}{\boldsymbol{\Phi}} \newcommand{\bchi}{\boldsymbol{\chi}}\newcommand{\bChi}{\boldsymbol{\Chi}} \newcommand{\bpsi}{\boldsymbol{\psi}}\newcommand{\bPsi}{\boldsymbol{\Psi}} \newcommand{\bomega}{\boldsymbol{\omega}}\newcommand{\bOmega}{\boldsymbol{\Omega}} \newcommand{\nA}{\mathbb{A}} \newcommand{\nB}{\mathbb{B}} \newcommand{\nC}{\mathbb{C}} \newcommand{\nD}{\mathbb{D}} \newcommand{\nE}{\mathbb{E}} \newcommand{\nF}{\mathbb{F}} \newcommand{\nG}{\mathbb{G}} \newcommand{\nH}{\mathbb{H}} \newcommand{\nI}{\mathbb{I}} \newcommand{\nJ}{\mathbb{J}} \newcommand{\nK}{\mathbb{K}} \newcommand{\nL}{\mathbb{L}} \newcommand{\nM}{\mathbb{M}} \newcommand{\nN}{\mathbb{N}} \newcommand{\nO}{\mathbb{O}} \newcommand{\nP}{\mathbb{P}} \newcommand{\nQ}{\mathbb{Q}} \newcommand{\nR}{\mathbb{R}} \newcommand{\nS}{\mathbb{S}} \newcommand{\nT}{\mathbb{T}} \newcommand{\nU}{\mathbb{U}} \newcommand{\nV}{\mathbb{V}} \newcommand{\nW}{\mathbb{W}} \newcommand{\nX}{\mathbb{X}} \newcommand{\nY}{\mathbb{Y}} \newcommand{\nZ}{\mathbb{Z}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \DeclareMathOperator*{\argmax}{argmax~} \DeclareMathOperator*{\argmin}{argmin~} \DeclareMathOperator*{\Tr}{Tr} \DeclareMathOperator*{\Bias}{Bias} \DeclareMathOperator*{\Var}{Var} \newcommand{\Perp}{\perp\!\!\! \perp} \let\dsad\d \renewcommand\d{\, \mathrm{d}} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\E}{\mathbb{E}} \newcommand{\Eb}[1]{\mathbb{E} \left[ #1 \right]} \newcommand{\F}{\mathcal{F}} \newcommand{\X}{\mathcal{X}} \newcommand{\vocab}[1]{\textbf{\color{blue} #1}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{\inner}[2]{ \left \langle #1, #2 \right \rangle } \newcommand{\ibra}[1]{\llbracket #1 \rrbracket} \newenvironment{nalign}{ \begin{equation} \begin{aligned} }{ \end{aligned} \end{equation} \ignorespacesafterend } \newcommand{\icol}[1]{ \left(\begin{smallmatrix}#1\end{smallmatrix}\right)% } \newcommand{\cc}[1]{\overline{#1}} \newcommand{\tr}[1]{\text{tr} \left( #1 \right)} \newcommand{\aff}{\text{aff} \,} \newcommand{\ri}{\text{ri} \, } \newcommand{\rb}{\text{rb} \, } \newcommand{\cl}{\text{cl} \, } \newcommand{\conv}{\text{conv} \, } \newcommand{\irow}[1]{ \begin{smallmatrix}(#1)\end{smallmatrix}% } \newcommand\abs[1]{\left \lvert #1 \right \rvert} $$

Motivation

The main motivation here is to delve deeper into convexity-preserving operations. It so happens that all subsequently introduced operations involve epigraphs. There is close connection between conjugacy and subgradients, and these operations also take part in Fenchel duality. As such, it is instructive to build some intuition of the basics before visiting the methods.

Remarks on simple convexity-preserving operations

There are a few commonly used criteria to construct convex functions as a composition of simpler ones. First, we summarize the ones that do not involve differentiability.

Claim 1 (Some convexity-preserving operations not involving conexity) If $f: \R^n \to \R$, then the following constructions hold:

$f(\by) := \sup_x g_{\bx}(\by)$, if all $g_x(\cdot)$ are convex.
$f(\by) := \lambda_1 g(\by) + \lambda_2 h(\by)$ for $\lambda_1, \lambda_2 > 0$ and $g,h$ convex.
$f(\by) := \lambda g(\by)$, $\lambda > 0$ and $g $ convex.
$f(\by) := \phi(g(\by))$, where $\phi$ is non-decreasing and convex, and $g$ is convex.

Proof. For the (1), note that the supremum of the fuctions is equivalent to intersection of epigraphs of the functions. Since the epigraphs are closed (for lower semi-continuous functions), so is their intersections, and thus the supremum is convex. Properties (2-4) are easy to verify.

As an example of the last operation, think of Jensen’s inequality, or taking $\phi(\bx) := -\log(\bx)$.

We now want to extend this set with initially obscure-looking operations, which will turn out to be very useful in later analysis.

Lower-envelope

Given a function, its epigraph is defined as all the points “above the graph”. We can establish a direction from sets to functions by defining a lower envelope as follows.

Definition 1. (Lower evelope) Given a set $S \subset \R^n \times \R$, its lower envelope $E_S(\bx)$ is defined as

\[\begin{equation} \begin{split} E_S(\bx) := \inf \{\mu \in (-\infty, +\infty] \mid (\bx, \mu) \in S \}. \end{split} \end{equation}\]

We can then establish that the lower envelope is convex. To do this, we first show that the epigraph of the lower nevelope has the natural form, from which it will immediately follow that the epigrah of the function is convex when the set $S$ is convex.

Proposition 1 (Lower envelope’s epigraph) (For a closed $S$ probably?) $\text{epi}(E_s(\cdot)) = S + ({ \bzero } \times \R_+)$.

Proof. Denote $R = S + ({ \bzero } \times \R_+)$.
``$\supset$’’: Take $\br \in R$, then $\br = \bs + (\bzero, c), c > 0$. Thus $\br \in \text{epi} E_s$.

``$\subset$’’: Take $(\be, e) \in \text{epi} (E_s) $. If $E_S(\be)$ is finite, then there is a $c \in \R$ s.t. $(\be, c) \in S$. Since $S$ is closed, $(\be, c) \in S$, but then $c$ can be scaled to $e$ if $c < e$, and thus the inclusion holds. If $\text{epi}(E_S)$ is not finite, then either $E_S(\bx) = +\infty$, in which case the point is not in domain and thus the inclusion holds, or $E_S(\bx) = -\infty$, in which case and point is in the epigraph, so the inclusion also holds.

With this in mind, we can easily show that the lower envelope is convex if the set $S$ is convex.

Proposition 2 (Lower envelope of a convex set is convex) The lower evenlope $E_S(\cdot)$ of a set $S \subset \R^n \times \R$ is convex if $S$ is convex.

Proof. It is sufficient to show that the epigraph of the lower envelope is convex. Consider $\bu, \bv \in \text{epi}(E_s)$, then by the Proposition (1), we can write these as $\bu = \bs_1 + (\bzero, c_1), \bv = \bs_2 + (\bzero, c_2)$, where $\bs_1, \bs_2 \in S, c_1, c_2 \in \R_+$. Then for $\lambda \in (0,1)$:

\[\begin{equation} \begin{split} \lambda \bu + (1-\lambda) \bv &= \lambda (\bs_1, (\bzero, c_1)) + (1 - \lambda)(\bs_2, (\bzero), c_2))\\ &= \underbrace{(\lambda \bs_1 + (1-\lambda) \bs_2)}_{\in S} + \underbrace{(\bzero, \lambda c_1 + (1-\lambda) c_2)}_{\in \{\bzero \} \times \R_+}. \end{split} \end{equation}\]

which completes the proof.

Alternative Proof (I think?). Since a set is convex iff it is convex on every line segment, the convexity of the epigraph can be shown directly through definition.

Infimal-projection

One function that will repeatedly turn up and will serve to proving other epigraphical properties, is the infimal-projection.

Definition (Infimal projection) For a $g(\bx, \by): \R^n \times \R^m \to \R$, a function $f(\bx) := \inf_{\by} g(\bx, \by)$ is called the infimal projection of $g$.

We can then the infimal projection is convex when the $g$ is.

Proposition 3 (Infimal projection of a convex function is convex) If $g(\cdot, \cdot)$ is convex, then $f(\bx) = \inf_{\by} g(\bx, \by)$ is convex.

Proof. Take $\lambda \in (0, 1)$, and $\bx_1, \bx_2 \in \text{dom } f$. Then

\[\begin{equation} \begin{split} \lambda f(\bx_1) + (1-\lambda) f(\bx_2) &= \lambda \inf_{\by_1} g(\bx_1, \by_1) + (1-\lambda) \inf_{\by_2} g(\bx_2, \by_2) \\ &= \inf_{\by_1, \by_2} \lambda g(\bx_1, \by_1) + (1-\lambda) g(\bx_2, \by_2) \\ & \geq \inf_{\by_1, \by_2} g(\lambda \bx_1 + (1-\lambda) \bx_2, \lambda \by_1 + (1-\lambda) \by_2) \\ & \geq \inf_{\by} g( \lambda \bx_1 + (1-\lambda) \bx_2, \by ) \\ &= f(\lambda \bx_1 + (1-\lambda)\bx_2). \end{split} \end{equation}\]

Infimal convolution

One building block of new convex functions that has a nice geometric interpretation is the infimal convolution. Using this approach we will derive the famous Huber loss – a juxtaposition of smooth quadratic function and non-smooth “robust” absolute function.

Definition (Infimal convolution) Given functions $f,g: \R^n \to \R$, we define the infimal convolution of these two functions denoted by $(f \square g)$, and defined as

\[\begin{equation} \begin{split} (f \square g)(\bx) := \inf_{\by} f(\by) + g(\bx - \by). \end{split} \end{equation}\]

We show some nice properties of the infimal convolution, which also gives an intuitive geometric view.

Proposition 4 (Properties of infimal convolution) For functions $f,g$ defined as above, the following hold:

$(f \square g) = (g \square f)$;
$(f \square g)$ is the lower-envelope of the Minkowski sum of the epigraphs, i.e. $(f \square g)(\bx) = \inf {\mu \in \R \mid (\bx, \mu) \in \text{epi } + \text{epi } g }$;
(This needs attention) $\text{epi }(f \square g) = \text{epi } f + \text{epi } g$.
If $f,g$ are convex, then so is $(f \square g)$.

Proof. (1) follows by a change of variables. (2) TODO (3) TOOD (4) Two ways to prove this:

Define $g(\bx, \by) := f(\by)+ g(\bx - \by)$. This $g$ can be shown to be convex, and then using the Proposition 3 it follows that it is convex.
Because of (2), we know that the lower-envelope is convex when the both functions are convex. Then the result follows.

Huber loss

Turns out that a popular (and differentiable!) Huber loss, that combines L1 and L2 penalties, can be derived using infimal convolution.

Example 1 For $f(x) = \frac{1}{2\delta} x^2, g(x) = \abs{x}, \delta > 0$, the Huber loss is exactly

\[\begin{equation} \begin{split} (f \square g)(x) = \begin{cases} \frac{1}{2\delta} x^2 & \text{if } \abs{x} \leq \delta \\ \abs{x} - \frac{\delta}{2} & \text{otherwise}. \end{cases} \end{split} \end{equation}\]

Proof. $(f \square g)(x) = \inf_w \abs{w} + \frac{1}{2\delta} (x - w)^2$. Then using the subdifferential rules (see Projected and Proximal Methods) we solve for

\[\begin{equation} \begin{split} \partial(\underbrace{\abs{w} + \frac{1}{2\delta} (x - w)^2}_{=: l(x)}) \in 0 \end{split} \end{equation}\]

as follows:

Case 1: if $w \neq 0$: then the minimization is differentiable, and we get

\[\begin{equation} \begin{split} & l'(x) = \text{sign}(w) - \frac{1}{\delta} (x - w) \overset{!}{=} 0 \\ \Rightarrow \quad & w = x - \delta \text{sign}(w) , \end{split} \end{equation}\]

Cases 1.1,1.2: break down the objective into $ w > 0$ and $ w< 0$, and get that for $w > 0 \Rightarrow x > \delta$, and for $w < 0 \Rightarrow x < \delta$.

Case 2: if $w = 0$, then by Fermat’s rule we set the subgradient to be in zero:

\[\begin{equation} \begin{split} & \partial(\abs{w} + \frac{1}{2\delta} (x - w)^2) \in 0 \\ \Rightarrow \quad & [-1, 1] + \frac{1}{\delta} (-x) \in 0 \\ \Rightarrow \quad & [-\delta, \delta] + (-x) \in 0 \\ \Rightarrow \quad & x \in [- \delta, \delta], \end{split} \end{equation}\]

Putting these all together we can notice that for $w < 0$ and $w > 0$, the objective is identical, and so can be written concisely. (TODO: show this!)

Closed Convex Envelope

A concept that will become useful when dealing with optimizing dual problems with Fenchel’s daulity, is the closed convex envelope. We will see that there is a duality in between convex sets and half-spaces. This idea will be taken further, and a link between convex functions and affine linear minorants will be established.

In order to explain this, we take a step back and look at the pre-requisites before delving deeper.

Closed half spaces

Interlude on duality between sets and half-spaces

We define the convex closed half-spaces as follows.

Definition (Closed half-space) Given a directional vector $\bv \in \R^n$ and the “bias” $\beta \in \R$, we define the closed half-space as $H_{\bv, \beta} := { \bx \in \R^n \mid \inner{\bx}{\bv} \leq \beta }$

Then the following follows.

Proposition 1 (Half-spaces are convex) $H_{\bv, \beta}$ is convex.

Proof. Trivial

A duality result that claims equivalence between sets and half-spaces, can be established.

Theorem 2 (Duality between sets and closed half-spaces) Any closed convex set $\text{cl } C$ can be equivalently written as

\[\begin{equation} \begin{split} \text{cl } C = \cap_{ \{ \bv, \beta \mid H_{\bv, \beta} \text{ containts cl } C \} } H_{\bv, \beta}. \end{split} \end{equation}\]

Proof (idea). ``$\subset$’’ TODO: finish when revising Separation Theorems.

It will be useful to be more precise and distinguish between the type of half-spaces as follows.

Defintion 3 (Types of half-spaces) We distinguish between three types of half-spaces:

(Upper-facing) $H^{-1}_{\bv, \beta} := { (\bx, \mu) \in \R^n \times \R \mid \inner{\bx}{\bv} - \mu \leq \beta }$;
(Lower-facing) $H^{+1}_{\bv, \beta} := { (\bx, \mu) \in \R^n \times \R \mid \inner{\bx}{\bv} + \mu \leq \beta }$;
(Vertical) $H^{0}_{\bv, \beta} := { (\bx, \mu) \in \R^n \times \R \mid \inner{\bx}{\bv} \leq \beta }$,

where the subscript of $H$ indicates the multiplicative constant of the second element $\mu$.

Note that we can rewrite $H^{s}_{\bv, \beta}$, with $s \in {-1, 0, 1}$, as

\[\begin{equation} \begin{split} H^{s}_{\bv, \beta}&= \{ (\bx, \mu) \in \R^n \times \R \mid \inner{\bx}{\bv} - \mu \leq \beta \} \\ &= \{ (\bx, \mu) \in \R^n \times \R \mid \inner{\icol{\bx\\\mu}}{\icol{\bv \\s}} \leq \beta \}, \end{split} \end{equation}\]

which shows the side towards the directional vector faces.

Duality between closed functions and linear minorants

Now that we have established an equivalence between convex sets and half-spaces, we can establish a connection in the function space between convex functions and linear functions that “lie below” the function everywhere. We begin from the relevant notation and then show this result.

Definition 4 (Minorant, Linear minorant, Affine minorant) A minorant $g$ of a function $f$ is such a function that satisfies $g(\bx) \leq f(\bx)$ globally. Linear minorant is such a minorant that has the form $g_l(\bx) = \inner{\bv}{\bx}$, while an affine minorant has the form of $g_a(\bx) = \inner{\bv}{\bx} + \beta$.

One concept that is and will be especially relevant is the closed convex envelope. It is basically a smallest change to the original function that makes it convex.

Definition 5 (Closed convex envelope) For a function $f: \R^n \to (-\infty, +\infty]$, $\text{cl conv }f $ is defined as a function that satisfies $\text{epi (cl conv } f) = \text{cl conv (epi }f)$.

We can then show that closed convex envelope always exists.

Proposition 6 (Existance of closed convex envelope) $\text{cl conv f}$ always exists.

Proof. Take lower envelope of the closed convex epigraph of $f$, i.e. $E_{ \text{cl conv (epi } f)}$, then apply Proposition 1, and find that the epigraph of this function is $\text{cl conv (epi } f) + ({\bzero} \times \R_+)$.

With this in mind, we establish the main relationship..

Theorem 5 For proper function $f : \R^n \to (-\infty, \infty]$, it holds that it is (proper and has an affine minorant) if and only if $\text{cl conv } f$ is proper.

Proof (Idea): ``$\Rightarrow$’’ If $f$ is proper and has an affine minorant, then $\text{cl conv }f $ is also proper (sinec the parameters of the half-spaces are finite).

``$\Leftarrow$’’: Assume $\text{cl conv } f$ is proper. Because of this, $\text{cl conv } f$ is its own (potentially non-affine) proper minorant. To show that an affine minorant exists, take the epigraph of $\text{cl conv} f$, and note that because $\text{cl conv} f$ is proper, it has no negatively infinitely values, and thus all the points live in $(-\infty, +\infty]$, and the dual of the sets can be invoked, and the epigraph rewritten in terms of intersection of closed half-spaces. Then write down the half-spaces in terms of three types: upper,lower and vertical half-spaces.

We then proceed to eliminate the vertical and upper-facing half-spaces, and only maintain the lower-facing ones. This then shows that a minorizer exists. (TODO)

Alternative Proof. Note that $\text{cl conv } f$ is convex, and as such we can find a subgradient at each point that minorizes the function.