Notes on Fenchel Conjugate and Duality

Motivation

In this post I cover some important properties of Fenchel conjugate. There are at least two important and interesting applications of it:

  1. Immediate application: computing closed convex envelope of a function. Using Fenchel conjugate, one can non-trivially convexify any function (when the closed convex hull of the epigraph of the original function does not span the whole space).

  2. Application important down the line: some optimization problems can be formulated in their dual representation. Turns out that Fenchel conjugate plays a key part in it.

Here I focus on the (1) point by first looking at the properties of the Fenchel conjugate, present some inuition for it, and then look at the relationship between Fenchel conjugate and convexifying functions.

Fenchel conjugate definition

We begin by defining the Fenchel conjugate.

Definition 1 (Fenchel conjugate) For a function $f : \R^n \to \R$, we denote by $f^* : \R^n \to (-\infty, +\infty], \by \mapsto \sup_{\by \in \R^n} \inner{\bx}{\by} - f(\bx)$.

Interpretation of Fenchel conjugate

At the first sight, the Fenchel conjugate might seem a bit bizzare and unmotivated. Before tying it its practical applications, it will be useful to have an intuitive feel of it.

There are two ways that I like to think about the Fenchel conjugate. One of them is the geometric view, and the other the economical one (thanks to this answer on math stackexchange.

Geometric view in 2D

Fenchel conjugate takes in a slope and returns the maximum distance between the line and the function. If the function is bounded from above, or never outgrows the linear function, the distance is infinite.

One way to construct the conjugate is a bruteforce one. For each possible slope, we can draw the linear function with that slope, and move the function in such a way, that it only touches the function and does not intersect it. Then, the distance that we moved the function will exactly be the supremum. Since we move the function in the opposite direction, we need to account for the fact that the Fenchel conjugate will be negative this distance.

Figure 1. The Fenchel conjugate computes how much the hyperplane with a given perpendicular vector needs to be lowered in order for it to just touch the graph.

Another way to construct the conjugate is a purely geometric one. At each point in the graph, one can draw an affine function that maximally minorizes the function (i.e. take the affine minorant that is the closest to the considered point). Then one can determine the value of the conjugate by finding the point where minorant intersects the $y$-axis.

Economical view

Based on the answer on math stackexchange, we can interpret the Fenchel conjugate as the total profit of a shop that sells products for a constant price $y$, but making a different number of products $x$ has a cost given by $f(x)$. Formally, we can write the the maximal possible profit in this case as

\[\begin{equation} \begin{split} \text{profit}(\text{price-per-item} = y) = \sup_x xy - f(x). \end{split} \end{equation}\]

So, ignoring the datails about non-positive number of items and non-integer values, we can interpret the Fenchel conjugate as the maximum profit a shop would make if the cost of production of items is dictated by the function $f(\cdot)$.

Establishing a connection between Fenchel conjugate and closed convex envelope

As it will turn out, $\text{cl conv } f = f^{**}$. Before showing this, we make it explicit that the Fenchel conjugate is convex. For that, we make use of the fact that the pointwise-supremum of convex functions is convex.

Lemma 2 (Supremum of convex functions is convex) Suppose $g_i$ are closed convex functions, then

\[\begin{equation} \begin{split} f(\bx) := \sup_i g_i(\bx) \end{split} \end{equation}\]

is closed and convex.

Proof. Note that a function is convex iff its epigraph is convex. Then

\[\begin{equation} \begin{split} \text{epi } f &= \{ (\bx, \mu) \in \R^n \times \R \mid \sup_i g_i(\bx) \leq \mu \} \\ &= \{ (\bx, \mu) \in \R^n \times \R \mid \forall i \, g_i(\bx) \leq \mu \} \\ &= \cap_{i} \{ (\bx, \mu) \in \R^n \times \R \mid g_i(\bx) \leq \mu \} \\ &= \cap_{i} \text{epi } g_i, \end{split} \end{equation}\]

which is closed and convex, and thus $f(\cdot)$ is closed and convex.

Using the above fact, we can show that the Fenchel conjugate is convex regardless of convexity of the function.

Proposition 2 (Fenchel conjugate is convex) $f^*$ is convex and closed.

Proof. Note that $f^*(\by)= \sup_\bx \inner{\bx}{\by} - f(\bx)$ is a pointwise supremum of closed and convex functions in $\by$. Thus, by applying Lemma 2, taking a supremum over functions convex in $\by$ is convex and closed.

Lemma 3 (inf sup is as large as sup inf) For $f : \R^n \times \R \to \R$ it holds that

\[\begin{equation} \begin{split} \sup_{y \in \R} \inf_{\bx \in \R^n} f(\bx, y) \leq \inf_{\bx \in \R^n} \sup_{y \in \R} f(\bx, y). \end{split} \end{equation}\]

(and vice versa by permuting the arguments).

Proof. Note that $\forall \bar \bx, \bar y$ it holds that

\[\begin{equation} \begin{split} f(\bar \bx, \bar y) \leq \sup_y f(\bar \bx, y), \end{split} \end{equation}\]

then take inf wrt $\bx$:

\[\begin{equation} \begin{split} \inf_{\bx} f(\bx, \bar y) \leq \inf_\bx \sup_y f(\bar \bx, y), \end{split} \end{equation}\]

and finally take the sup wrt y to get the result.

\[\begin{equation} \begin{split} \sup_y \inf_\bx f(\bx, y) \leq \inf_\bx \sup_y f(\bx, y). \end{split} \end{equation}\]

We will also make use of the facts that $f(\bx) \leq g(\bx) \, \forall \bx \Rightarrow f^*(\bx) \geq g^*(\bx)$, and that the double conjugate of a affine function is the same affine function.

Proposition 4 (Double conjugate of an affine function is affine) If $h: \R^n \to \R, \bx \mapsto \inner{\bv}{\bx} + \bb$ for $\bv, \bb \in \R^n$, then $h^{**}(\bx) = h(\bx)$.

Proof. Straightforward computation.

With this in mind, we can show the main result: a direct connection double conjugancy and closed convex envelope.

Theorem 5 (Double conjugate is closed convex envelope) If $f : \R^n \to \R$, and $\text{cl conv }f$ is proper, then

\[\begin{equation} \begin{split} \text{cl conv } f = f^{**}. \end{split} \end{equation}\]

Proof. We first show that $\forall \bx$ it holds that $f^{**}(\bx) \geq f(\bx)$ (where by construction, we know that $f(\bx) \geq \text{cl conv }f$, because $\text{cl conv } f$ is a pointwise supremum of affine minorants). Then, we show that actually $f^{**}(\bx) \leq \text{cl conv } f(\bx)$, which shows the equality between the two.

(1): To show $f^{**} \leq f$: Write down the definition, use the fact that sup inf is less than inf sup, and get the relation. This then implies that $f^{* *} \leq \text{cl conv }f$, since the closed convex envelope is by construction lower (or equal) everywhere.

(2): To show that $f^{* *} \geq \text{cl conv } f$: Since the RHS is the pointwise-supremum of affine minorants, consider any one of the minorants $h$. We know that $f \geq h \Rightarrow f^{*} \leq h^{*} \Rightarrow f^{**} \geq h$, which holds for all affine minorants of $f$. Because $\text{cl conv }f$ is a supremum over affine minorants, all of which lie below the double conjugate, this shows that

\[\begin{equation} \begin{split} f^{**} \leq \text{cl conv} f, \end{split} \end{equation}\]

as desired.

Question: But when is the closed envelope proper?

Establishing a connection between the subdifferential and Fenchel conjugate

As it turns out, there is a nice connection between the of subdifferential and subdifferential of the conjugate: $(\partial f)^{-1} = \partial (f^{*})$.

TODO: give a proof.

Note that by Fermat’s rule (covered here), it holds that $\bx^* = \argmin_\bx f(\bx)$ iff $\bzero \in \partial f(\bx^*)$. By using the inverse relation, we can find the set of minimizers as $X^* = \partial(f^*)(\bzero)$.

Relating duality of sets to duality in optimization (based on (missing reference))

This is based on an excellent answer on Mathexchange (missing reference). We know from the duality result between sets and half-spaces that every closed convex set can be equivalently expressed as an intersection of closed half-spaces. This equivalent representation can be thought to define a dual space in which that convex set lives.

But we are here interested in actually charcterizing the dual space for functions, not sets. From the duality result between functions and affine minorants, we know that every proper closed convex function can be equivalently written as pointwise supremum over affine minorants. Concretely, for a convex function $f$, it holds that $\text{cl conv } f = f^{**}$, and $f \geq f^{**}$ everywhere.

One way to characterize this dual space, is by defining a function that for each slope gives a constant vector with which the line with that slope should be decreased in order to minorize the function. Since we are dealing with a convex functions, we know that every points in the domain can be minorized, and so for a slope $\bm^*$ that lives in this dual representation space, we can find such constant $\balpha$ as

\[\begin{equation} \begin{split} & \inner{\bm}{\bx} - \balpha \leq f(\bx) \quad \forall \bx \\ \Rightarrow \quad & \inner{\bm}{\bx} -f(\bx) \leq \balpha \quad \forall \bx \\ \Rightarrow \quad & \sup_\bx \inner{\bm}{\bx} -f(\bx) \leq \balpha, \end{split} \end{equation}\]

where of course $\balpha$ is exactly the definition of the conjugate, i.e. $f^*(\bm) := \balpha$.

With this in mind, since the representation of the convex function is equivalent to its dual, (TODO: show that it actually is :) ) it is no surprise that dual of the dual recovers the function in its original representation.

Dual problem in optimization

The main point of duality in optimization, is to maximize the conjugate of the objective. For non-convex functions, we know that it convexifying the function through bi-conjugate leads to minorization of the function, i.e. $f \geq f^{**}$. So the bi-conjugate of the optimized objective lower bounds the true objective.

We assume the optimization problem to have the following form:

Assumption 6 Throughout the notes, assume that we are interested in the minimization problem for convex, proper $f : \R^n \to (-\infty, +\infty]$.

Since some problems are hard to optimize directly, we introudce an alternative viewpoint using the perturbation framework. The idea here is to perturb the function in a way that preserves convexity. This is a design choice, and from my understanding, depends on a problem. This is encoded in the value function. We make these explicit as follows.

Definition 7 (Perturbed objective function) A problem-specific perturbed objective function is defined as a convex $f: \R^n \times \R^m \to (-\infty, +\infty]$ in such a way that $f(\bx, \bzero) = f(\bx)$.

I do not yet have intuition what good choices of perturbed objectives are. But if we take it for granted, we can start executing our goal: to minimize the objective, we can maximize the dual. A key definition for this is the value function.

Definition 8 (Value function) A value function $p: \R^m \to (-\infty, +\infty]$ is defined as

\[\begin{equation} \begin{split} p(\by) := \inf_\bx f(\bx, \by), \end{split} \end{equation}\]

Note that due to epigraphical operations that preserve convexity, if $f$ is convex, then the value function is the infimal projection of the function $f$, which is convex.

The dual problem problem is evaluating $p(\bzero)$, which is the quantity corresponding to the original minimization problem.

Definition 9 (Primal problem) The primal problem is to evaluate $p(\bzero) = \inf_\bx f(\bx, \bzero) = \inf_\bx f(\bx)$ (for $\bzero \in \R^m$), the original problem.

Finally, in accordance to our goal, we define the dual problem as evaluating $p^{**}(\bzero), \bzero \in \R^m$, defined as follows.

Definition 10 (Dual problem) The dual problem is to evaluate $q(\bzero):= p^{**}(\bzero), \bzero \in \R^m$.

We do not define the function $q(\cdot)$ further, but are only concerned with it at $\bzero$. From an intuitive standpoint everythere should be clear that $p(\bzero) \leq q(\bzero)$. We can unravel these terms and define them in terms of the original function $f$ as follows.

Remark (Useful quantities pertraining to optimization in duality) It holds that

  1. $q(\bzero) = p^{**}(\bzero) = \sup_\by -p^*(\by)$;
  2. $p^*(\by) = f^*(0, \by)$;
  3. $q(\bzero) = \sup_\by -f^*(0, \by)$.

Proof.

(1) By definition.

(2) Note that

\[\begin{equation} \begin{split} p^*(\by) &= \sup_\bx \inner{\bx}{\by} - p(\bx) = \sup_\bx \inner{\bx}{\by} - \inf_\bz f(\bx, \bz) \\ &= \sup_{\bx, \bz} \inner{\bx}{\by} - f(\bx, \bz) = \sup_{\bx, \bz} \inner{\icol{\bx\\\bz}}{\icol{\by\\\bzero}} - f(\bx,\bz) \\ &= f^*(\by, \bzero). \end{split} \end{equation}\]

(3) Follows from 1 and 2.

Question: It makes sense that the dual probablem should be more interesting than the primal one when the function $f$ is not convex. But when the problem is convex, we trivially get $p(\bzero) = q(\bzero)$. I fail to see how this setting could be helpful to us. I can only see two advantages:

  1. Lower-bounding the objective since optimizing the perturbed problem might be easier.
  2. The number of variables in the dual space is the same as the number of perturbations we make. Certainly this can be the optimization easier in some cases. But how to arrive at this?

Fenchel-Rockerfellar duality

One neat example to illustrate the application of duality in optimization is the Fenchel-Rockerfellar duality. The main theorem is as follows.

Theorem (Fenchel-Rockerfellar duality) For proper and convex $h : \R^m \to (-\infty, +\infty], g: \R^n \to (-\infty, +\infty]$ and linear map $A: \R^m \to \R^n$, the original optimization problem

\[\begin{equation} \begin{split} \inf_{\bx \in \R^m} h(A\bx) + g(\bx), \end{split} \end{equation}\]

with a perturbation in the form of

\[\begin{equation} \begin{split} f(\bx, \bu) := h(A\bx + \bu) + g(\bx), \end{split} \end{equation}\]

has the primal optimization problem (with $\bzero \in \text{dom }p$)

\[\begin{equation} \begin{split} p(\bzero) = \inf_{\bx \in \R^m} f(\bx, \bzero) = \inf_{\bx \in \R^m} h(A\bx) + g(\bx), \end{split} \end{equation}\]

and the dual problem in the form

\[\begin{equation} \begin{split} q(\bzero) = p^{**}(\bzero) = \sup_{\by \in \R^n} -h^*(\by) - g(-A^* \by). \end{split} \end{equation}\]

Proof.

Note that \(\begin{equation} \begin{split} p^*(\by) = f^*(\bzero, \by), \end{split} \end{equation}\)

and

\[\begin{equation} \begin{split} f^*(\bzero, \by) &= \sup_{\ba, \bb} \inner{\bb}{\by} - h(A \ba + \bb) - g(\ba) \\ &= \sup_{\ba, \bc} \inner{\bc - A \ba}{\by} - h(\bc) - g(\ba) \qquad \text{(change of vars }\bc := A\ba + \bb \text{, so } \bb = \bc - A\ba) \\ &= \sup_{\bc} \inner{\bc}{\by} - h(\bc) + \sup_{\ba} - \inner{A\ba}{\by} - g(\ba) \\ &= h^*(\by) + \sup_\ba \inner{\ba}{-A^* \by} - g(\ba) \\ &= h^*(\by) + g^*(-A^*\by), \end{split} \end{equation}\]

and so \(\begin{equation} \begin{split} p^{**}(\bzero) = \sup_\by -p^*(\by) = \sup_\by -f^*(0, \by) = \sup_\by -h^*(\by) - g^*(-A^*\by), \end{split} \end{equation}\)

as desired.

Applying Fenchel-Rockarfellar duality: ROF denoising

Take the following optimization problem

\[\begin{equation} \begin{split} \min_{\bd \in \R^{m \times n}} \norm{\bx - \bd}_F^2 + \lambda \norm{D \bd}_{F, 1}, \end{split} \end{equation}\]

turns out (not shown here), that using the same kinds of algebraic maniupulations as we did in the proof of Fenchel-Rockarfellar duality, we can show that the problem above is equivalent to one where the regularization term can be replaced by a $\norm{\cdot}_{F,\infty}$ norm, which allows for easy optimization through projected gradients.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Notes on "Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions" (2018)
  • Notes on "Attention Is All You Need" (2017)
  • Notes on Conditional Gradient Method
  • Comparing Non-Smooth Optimization through Subgradients on Regularized Logistic Regression Problem
  • Notes on Optimizing Non-Differentiable (but Convex) Functions: Subgradients, and Projected and Proximal Methods