aboutsummaryrefslogtreecommitdiff
path: root/docs/solving.tex
diff options
context:
space:
mode:
Diffstat (limited to 'docs/solving.tex')
-rw-r--r--docs/solving.tex767
1 files changed, 0 insertions, 767 deletions
diff --git a/docs/solving.tex b/docs/solving.tex
deleted file mode 100644
index ee8a3b8..0000000
--- a/docs/solving.tex
+++ /dev/null
@@ -1,767 +0,0 @@
-%!TEX root = ceres-solver.tex
-\chapter{Solving}
-Effective use of Ceres requires some familiarity with the basic components of a nonlinear least squares solver, so before we describe how to configure the solver, we will begin by taking a brief look at how some of the core optimization algorithms in Ceres work and the various linear solvers and preconditioners that power it.
-
-\section{Trust Region Methods}
-\label{sec:trust-region}
-Let $x \in \mathbb{R}^{n}$ be an $n$-dimensional vector of variables, and
-$ F(x) = \left[f_1(x), \hdots, f_{m}(x) \right]^{\top}$ be a $m$-dimensional function of $x$. We are interested in solving the following optimization problem~\footnote{At the level of the non-linear solver, the block and residual structure is not relevant, therefore our discussion here is in terms of an optimization problem defined over a state vector of size $n$.},
-\begin{equation}
- \arg \min_x \frac{1}{2}\|F(x)\|^2\ .
- \label{eq:nonlinsq}
-\end{equation}
-Here, the Jacobian $J(x)$ of $F(x)$ is an $m\times n$ matrix, where $J_{ij}(x) = \partial_j f_i(x)$ and the gradient vector $g(x) = \nabla \frac{1}{2}\|F(x)\|^2 = J(x)^\top F(x)$. Since the efficient global optimization of~\eqref{eq:nonlinsq} for general $F(x)$ is an intractable problem, we will have to settle for finding a local minimum.
-
-The general strategy when solving non-linear optimization problems is to solve a sequence of approximations to the original problem~\cite{nocedal2000numerical}. At each iteration, the approximation is solved to determine a correction $\Delta x$ to the vector $x$. For non-linear least squares, an approximation can be constructed by using the linearization $F(x+\Delta x) \approx F(x) + J(x)\Delta x$, which leads to the following linear least squares problem:
-\begin{equation}
- \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2
- \label{eq:linearapprox}
-\end{equation}
-Unfortunately, na\"ively solving a sequence of these problems and
-updating $x \leftarrow x+ \Delta x$ leads to an algorithm that may not
-converge. To get a convergent algorithm, we need to control the size
-of the step $\Delta x$. And this is where the idea of a trust-region
-comes in. Algorithm~\ref{alg:trust-region} describes the basic trust-region loop for non-linear least squares problems.
-
-\begin{algorithm}
-\caption{The basic trust-region algorithm.\label{alg:trust-region}}
-\begin{algorithmic}
-\REQUIRE Initial point $x$ and a trust region radius $\mu$.
-\LOOP
-\STATE{Solve $\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2$ s.t. $\|D(x)\Delta x\|^2 \le \mu$}
-\STATE{$\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 - \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 - \|F(x)\|^2}$}
-\IF {$\rho > \epsilon$}
-\STATE{$x = x + \Delta x$}
-\ENDIF
-\IF {$\rho > \eta_1$}
-\STATE{$\rho = 2 * \rho$}
-\ELSE
-\IF {$\rho < \eta_2$}
-\STATE {$\rho = 0.5 * \rho$}
-\ENDIF
-\ENDIF
-\ENDLOOP
-\end{algorithmic}
-\end{algorithm}
-
-Here, $\mu$ is the trust region radius, $D(x)$ is some matrix used to define a metric on the domain of $F(x)$ and $\rho$ measures the quality of the step $\Delta x$, i.e., how well did the linear model predict the decrease in the value of the non-linear objective. The idea is to increase or decrease the radius of the trust region depending on how well the linearization predicts the behavior of the non-linear objective, which in turn is reflected in the value of $\rho$.
-
-The key computational step in a trust-region algorithm is the solution of the constrained optimization problem
-\begin{align}
- \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 \\
- \text{such that}&\quad \|D(x)\Delta x\|^2 \le \mu
-\label{eq:trp}
-\end{align}
-
-There are a number of different ways of solving this problem, each giving rise to a different concrete trust-region algorithm. Currently Ceres, implements two trust-region algorithms - Levenberg-Marquardt and Dogleg.
-
-\subsection{Levenberg-Marquardt}
-The Levenberg-Marquardt algorithm~\cite{levenberg1944method, marquardt1963algorithm} is the most popular algorithm for solving non-linear least squares problems. It was also the first trust region algorithm to be developed~\cite{levenberg1944method,marquardt1963algorithm}. Ceres implements an exact step~\cite{madsen2004methods} and an inexact step variant of the Levenberg-Marquardt algorithm~\cite{wright1985inexact,nash1990assessing}.
-
-It can be shown, that the solution to~\eqref{eq:trp} can be obtained by solving an unconstrained optimization of the form
-\begin{align}
- \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda \|D(x)\Delta x\|^2
-\end{align}
-Where, $\lambda$ is a Lagrange multiplier that is inverse related to $\mu$. In Ceres, we solve for
-\begin{align}
- \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
-\label{eq:lsqr}
-\end{align}
-The matrix $D(x)$ is a non-negative diagonal matrix, typically the square root of the diagonal of the matrix $J(x)^\top J(x)$.
-
-Before going further, let us make some notational simplifications. We will assume that the matrix $\sqrt{\mu} D$ has been concatenated at the bottom of the matrix $J$ and similarly a vector of zeros has been added to the bottom of the vector $f$ and the rest of our discussion will be in terms of $J$ and $f$, \ie the linear least squares problem.
-\begin{align}
- \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
- \label{eq:simple}
-\end{align}
-For all but the smallest problems the solution of~\eqref{eq:simple} in each iteration of the Levenberg-Marquardt algorithm is the dominant computational cost in Ceres. Ceres provides a number of different options for solving~\eqref{eq:simple}. There are two major classes of methods - factorization and iterative.
-
-The factorization methods are based on computing an exact solution of~\eqref{eq:lsqr} using a Cholesky or a QR factorization and lead to an exact step Levenberg-Marquardt algorithm. But it is not clear if an exact solution of~\eqref{eq:lsqr} is necessary at each step of the LM algorithm to solve~\eqref{eq:nonlinsq}. In fact, we have already seen evidence that this may not be the case, as~\eqref{eq:lsqr} is itself a regularized version of~\eqref{eq:linearapprox}. Indeed, it is possible to construct non-linear optimization algorithms in which the linearized problem is solved approximately. These algorithms are known as inexact Newton or truncated Newton methods~\cite{nocedal2000numerical}.
-
-An inexact Newton method requires two ingredients. First, a cheap method for approximately solving systems of linear equations. Typically an iterative linear solver like the Conjugate Gradients method is used for this purpose~\cite{nocedal2000numerical}. Second, a termination rule for the iterative solver. A typical termination rule is of the form
-\begin{equation}
- \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|. \label{eq:inexact}
-\end{equation}
-Here, $k$ indicates the Levenberg-Marquardt iteration number and $0 < \eta_k <1$ is known as the forcing sequence. Wright \& Holt \cite{wright1985inexact} prove that a truncated Levenberg-Marquardt algorithm that uses an inexact Newton step based on~\eqref{eq:inexact} converges for any sequence $\eta_k \leq \eta_0 < 1$ and the rate of convergence depends on the choice of the forcing sequence $\eta_k$.
-
-Ceres supports both exact and inexact step solution strategies. When the user chooses a factorization based linear solver, the exact step Levenberg-Marquardt algorithm is used. When the user chooses an iterative linear solver, the inexact step Levenberg-Marquardt algorithm is used.
-
-\subsection{Dogleg}
-\label{sec:dogleg}
-Another strategy for solving the trust region problem~\eqref{eq:trp} was introduced by M. J. D. Powell. The key idea there is to compute two vectors
-\begin{align}
- \Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \|J(x)\Delta x + f(x)\|^2.\\
- \Delta x^{\text{Cauchy}} &= -\frac{\|g(x)\|^2}{\|J(x)g(x)\|^2}g(x).
-\end{align}
-Note that the vector $\Delta x^{\text{Gauss-Newton}}$ is the solution
-to~\eqref{eq:linearapprox} and $\Delta x^{\text{Cauchy}}$ is the
-vector that minimizes the linear approximation if we restrict
-ourselves to moving along the direction of the gradient. Dogleg methods finds a vector $\Delta x$ defined by $\Delta
-x^{\text{Gauss-Newton}}$ and $\Delta x^{\text{Cauchy}}$ that solves
-the trust region problem. Ceres supports two
-variants.
-
-\texttt{TRADITIONAL\_DOGLEG} as described by Powell,
-constructs two line segments using the Gauss-Newton and Cauchy vectors
-and finds the point farthest along this line shaped like a dogleg
-(hence the name) that is contained in the
-trust-region. For more details on the exact reasoning and computations, please see Madsen et al~\cite{madsen2004methods}.
-
- \texttt{SUBSPACE\_DOGLEG} is a more sophisticated method
-that considers the entire two dimensional subspace spanned by these
-two vectors and finds the point that minimizes the trust region
-problem in this subspace\cite{byrd1988approximate}.
-
-The key advantage of the Dogleg over Levenberg Marquardt is that if the step computation for a particular choice of $\mu$ does not result in sufficient decrease in the value of the objective function, Levenberg-Marquardt solves the linear approximation from scratch with a smaller value of $\mu$. Dogleg on the other hand, only needs to compute the interpolation between the Gauss-Newton and the Cauchy vectors, as neither of them depend on the value of $\mu$.
-
-The Dogleg method can only be used with the exact factorization based linear solvers.
-
-\subsection{Inner Iterations}
-\label{sec:inner}
-Some non-linear least squares problems have additional structure in
-the way the parameter blocks interact that it is beneficial to modify
-the way the trust region step is computed. e.g., consider the
-following regression problem
-
-\begin{equation}
- y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1}
-\end{equation}
-
-Given a set of pairs $\{(x_i, y_i)\}$, the user wishes to estimate
-$a_1, a_2, b_1, b_2$, and $c_1$.
-
-Notice that the expression on the left is linear in $a_1$ and $a_2$,
-and given any value for $b_1$, $b_2$ and $c_1$, it is possible to use
-linear regression to estimate the optimal values of $a_1$ and
-$a_2$. It's possible to analytically eliminate the variables
-$a_1$ and $a_2$ from the problem entirely. Problems like these are
-known as separable least squares problem and the most famous algorithm
-for solving them is the Variable Projection algorithm invented by
-Golub \& Pereyra~\cite{golub-pereyra-73}.
-
-Similar structure can be found in the matrix factorization with
-missing data problem. There the corresponding algorithm is
-known as Wiberg's algorithm~\cite{wiberg}.
-
-Ruhe \& Wedin present an analysis of
-various algorithms for solving separable non-linear least
-squares problems and refer to {\em Variable Projection} as
-Algorithm I in their paper~\cite{ruhe-wedin}.
-
-Implementing Variable Projection is tedious and expensive. Ruhe \&
-Wedin present a simpler algorithm with comparable convergence
-properties, which they call Algorithm II. Algorithm II performs an
-additional optimization step to estimate $a_1$ and $a_2$ exactly after
-computing a successful Newton step.
-
-
-This idea can be generalized to cases where the residual is not
-linear in $a_1$ and $a_2$, i.e.,
-
-\begin{equation}
- y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1})
-\end{equation}
-
-In this case, we solve for the trust region step for the full problem,
-and then use it as the starting point to further optimize just $a_1$
-and $a_2$. For the linear case, this amounts to doing a single linear
-least squares solve. For non-linear problems, any method for solving
-the $a_1$ and $a_2$ optimization problems will do. The only constraint
-on $a_1$ and $a_2$ (if they are two different parameter block) is that
-they do not co-occur in a residual block.
-
-This idea can be further generalized, by not just optimizing $(a_1,
-a_2)$, but decomposing the graph corresponding to the Hessian matrix's
-sparsity structure into a collection of non-overlapping independent sets
-and optimizing each of them.
-
-Setting \texttt{Solver::Options::use\_inner\_iterations} to true
-enables
-the use of this non-linear generalization of Ruhe \& Wedin's Algorithm
-II. This version of Ceres has a higher iteration complexity, but also
-displays better convergence behavior per iteration.
-
-Setting \texttt{Solver::Options::num\_threads} to the maximum number
-possible is highly recommended.
-
-\subsection{Non-monotonic Steps}
-\label{sec:non-monotonic}
-Note that the basic trust-region algorithm described in
-Algorithm~\ref{alg:trust-region} is a descent algorithm in that they
-only accepts a point if it strictly reduces the value of the objective
-function.
-
-Relaxing this requirement allows the algorithm to be more
-efficient in the long term at the cost of some local increase
-in the value of the objective function.
-
-This is because allowing for non-decreasing objective function
-values in a princpled manner allows the algorithm to ``jump over
-boulders'' as the method is not restricted to move into narrow
-valleys while preserving its convergence properties.
-
-Setting \texttt{Solver::Options::use\_nonmonotonic\_steps} to \texttt{true}
-enables the non-monotonic trust region algorithm as described by
-Conn, Gould \& Toint in~\cite{conn2000trust}.
-
-Even though the value of the objective function may be larger
-than the minimum value encountered over the course of the
-optimization, the final parameters returned to the user are the
-ones corresponding to the minimum cost over all iterations.
-
-The option to take non-monotonic is available for all trust region
-strategies.
-
-\section{\texttt{LinearSolver}}
-Recall that in both of the trust-region methods described above, the key computational cost is the solution of a linear least squares problem of the form
-\begin{align}
- \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
- \label{eq:simple2}
-\end{align}
-
-
-Let $H(x)= J(x)^\top J(x)$ and $g(x) = -J(x)^\top f(x)$. For notational convenience let us also drop the dependence on $x$. Then it is easy to see that solving~\eqref{eq:simple2} is equivalent to solving the {\em normal equations}
-\begin{align}
-H \Delta x &= g \label{eq:normal}
-\end{align}
-
-Ceres provides a number of different options for solving~\eqref{eq:normal}.
-
-\subsection{\texttt{DENSE\_QR}}
-For small problems (a couple of hundred parameters and a few thousand residuals) with relatively dense Jacobians, \texttt{DENSE\_QR} is the method of choice~\cite{bjorck1996numerical}. Let $J = QR$ be the QR-decomposition of $J$, where $Q$ is an orthonormal matrix and $R$ is an upper triangular matrix~\cite{trefethen1997numerical}. Then it can be shown that the solution to~\eqref{eq:normal} is given by
-\begin{align}
- \Delta x^* = -R^{-1}Q^\top f
-\end{align}
-Ceres uses \texttt{Eigen}'s dense QR factorization routines.
-
-\subsection{\texttt{DENSE\_NORMAL\_CHOLESKY} \& \texttt{SPARSE\_NORMAL\_CHOLESKY}}
-Large non-linear least square problems are usually sparse. In such cases, using a dense QR factorization is inefficient. Let $H = R^\top R$ be the Cholesky factorization of the normal equations, where $R$ is an upper triangular matrix, then the solution to ~\eqref{eq:normal} is given by
-\begin{equation}
- \Delta x^* = R^{-1} R^{-\top} g.
-\end{equation}
-The observant reader will note that the $R$ in the Cholesky
-factorization of $H$ is the same upper triangular matrix $R$ in the QR
-factorization of $J$. Since $Q$ is an orthonormal matrix, $J=QR$
-implies that $J^\top J = R^\top Q^\top Q R = R^\top R$. There are two variants of Cholesky factorization -- sparse and
-dense.
-
-\texttt{DENSE\_NORMAL\_CHOLESKY} as the name implies performs a dense
-Cholesky factorization of the normal equations. Ceres uses
-\texttt{Eigen}'s dense LDLT factorization routines.
-
-\texttt{SPARSE\_NORMAL\_CHOLESKY}, as the name implies performs a
-sparse Cholesky factorization of the normal equations. This leads to
-substantial savings in time and memory for large sparse
-problems. Ceres uses the sparse Cholesky factorization routines in Professor Tim Davis' \texttt{SuiteSparse} or
-\texttt{CXSparse} packages~\cite{chen2006acs}.
-
-\subsection{\texttt{DENSE\_SCHUR} \& \texttt{SPARSE\_SCHUR}}
-While it is possible to use \texttt{SPARSE\_NORMAL\_CHOLESKY} to solve bundle adjustment problems, bundle adjustment problem have a special structure, and a more efficient scheme for solving~\eqref{eq:normal} can be constructed.
-
-Suppose that the SfM problem consists of $p$ cameras and $q$ points and the variable vector $x$ has the block structure $x = [y_{1},\hdots,y_{p},z_{1},\hdots,z_{q}]$. Where, $y$ and $z$ correspond to camera and point parameters, respectively. Further, let the camera blocks be of size $c$ and the point blocks be of size $s$ (for most problems $c$ = $6$--$9$ and $s = 3$). Ceres does not impose any constancy requirement on these block sizes, but choosing them to be constant simplifies the exposition.
-
-A key characteristic of the bundle adjustment problem is that there is no term $f_{i}$ that includes two or more point blocks. This in turn implies that the matrix $H$ is of the form
-\begin{equation}
- H = \left[
- \begin{matrix} B & E\\ E^\top & C
- \end{matrix}
- \right]\ ,
-\label{eq:hblock}
-\end{equation}
-where, $B \in \reals^{pc\times pc}$ is a block sparse matrix with $p$ blocks of size $c\times c$ and $C \in \reals^{qs\times qs}$ is a block diagonal matrix with $q$ blocks of size $s\times s$. $E \in \reals^{pc\times qs}$ is a general block sparse matrix, with a block of size $c\times s$ for each observation. Let us now block partition $\Delta x = [\Delta y,\Delta z]$ and $g=[v,w]$ to restate~\eqref{eq:normal} as the block structured linear system
-\begin{equation}
- \left[
- \begin{matrix} B & E\\ E^\top & C
- \end{matrix}
- \right]\left[
- \begin{matrix} \Delta y \\ \Delta z
- \end{matrix}
- \right]
- =
- \left[
- \begin{matrix} v\\ w
- \end{matrix}
- \right]\ ,
-\label{eq:linear2}
-\end{equation}
-and apply Gaussian elimination to it. As we noted above, $C$ is a block diagonal matrix, with small diagonal blocks of size $s\times s$.
-Thus, calculating the inverse of $C$ by inverting each of these blocks is cheap. This allows us to eliminate $\Delta z$ by observing that $\Delta z = C^{-1}(w - E^\top \Delta y)$, giving us
-\begin{equation}
- \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ . \label{eq:schur}
-\end{equation}
-The matrix
-\begin{equation}
-S = B - EC^{-1}E^\top\ ,
-\end{equation}
-is the Schur complement of $C$ in $H$. It is also known as the {\em reduced camera matrix}, because the only variables participating in~\eqref{eq:schur} are the ones corresponding to the cameras. $S \in \reals^{pc\times pc}$ is a block structured symmetric positive definite matrix, with blocks of size $c\times c$. The block $S_{ij}$ corresponding to the pair of images $i$ and $j$ is non-zero if and only if the two images observe at least one common point.
-
-Now, \eqref{eq:linear2}~can be solved by first forming $S$, solving for $\Delta y$, and then back-substituting $\Delta y$ to obtain the value of $\Delta z$.
-Thus, the solution of what was an $n\times n$, $n=pc+qs$ linear system is reduced to the inversion of the block diagonal matrix $C$, a few matrix-matrix and matrix-vector multiplies, and the solution of block sparse $pc\times pc$ linear system~\eqref{eq:schur}. For almost all problems, the number of cameras is much smaller than the number of points, $p \ll q$, thus solving~\eqref{eq:schur} is significantly cheaper than solving~\eqref{eq:linear2}. This is the {\em Schur complement trick}~\cite{brown-58}.
-
-This still leaves open the question of solving~\eqref{eq:schur}. The
-method of choice for solving symmetric positive definite systems
-exactly is via the Cholesky
-factorization~\cite{trefethen1997numerical} and depending upon the
-structure of the matrix, there are, in general, two options. The first
-is direct factorization, where we store and factor $S$ as a dense
-matrix~\cite{trefethen1997numerical}. This method has $O(p^2)$ space complexity and $O(p^3)$ time
-complexity and is only practical for problems with up to a few hundred
-cameras. Ceres implements this strategy as the \texttt{DENSE\_SCHUR} solver.
-
-
- But, $S$ is typically a fairly sparse matrix, as most images
-only see a small fraction of the scene. This leads us to the second
-option: sparse direct methods. These methods store $S$ as a sparse
-matrix, use row and column re-ordering algorithms to maximize the
-sparsity of the Cholesky decomposition, and focus their compute effort
-on the non-zero part of the factorization~\cite{chen2006acs}.
-Sparse direct methods, depending on the exact sparsity structure of the Schur complement,
-allow bundle adjustment algorithms to significantly scale up over those based on dense
-factorization. Ceres implements this strategy as the \texttt{SPARSE\_SCHUR} solver.
-
-\subsection{\texttt{CGNR}}
-For general sparse problems, if the problem is too large for \texttt{CHOLMOD} or a sparse linear algebra library is not linked into Ceres, another option is the \texttt{CGNR} solver. This solver uses the Conjugate Gradients solver on the {\em normal equations}, but without forming the normal equations explicitly. It exploits the relation
-\begin{align}
- H x = J^\top J x = J^\top(J x)
-\end{align}
-When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm.
-
-\subsection{\texttt{ITERATIVE\_SCHUR}}
-Another option for bundle adjustment problems is to apply PCG to the reduced camera matrix $S$ instead of $H$. One reason to do this is that $S$ is a much smaller matrix than $H$, but more importantly, it can be shown that $\kappa(S)\leq \kappa(H)$. Ceres implements PCG on $S$ as the \texttt{ITERATIVE\_SCHUR} solver. When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm.
-
-The cost of forming and storing the Schur complement $S$ can be prohibitive for large problems. Indeed, for an inexact Newton solver that computes $S$ and runs PCG on it, almost all of its time is spent in constructing $S$; the time spent inside the PCG algorithm is negligible in comparison. Because PCG only needs access to $S$ via its product with a vector, one way to evaluate $Sx$ is to observe that
-\begin{align}
- x_1 &= E^\top x \notag \\
- x_2 &= C^{-1} x_1 \notag\\
- x_3 &= Ex_2 \notag\\
- x_4 &= Bx \notag\\
- Sx &= x_4 - x_3\ .\label{eq:schurtrick1}
-\end{align}
-Thus, we can run PCG on $S$ with the same computational effort per iteration as PCG on $H$, while reaping the benefits of a more powerful preconditioner. In fact, we do not even need to compute $H$, \eqref{eq:schurtrick1} can be implemented using just the columns of $J$.
-
-Equation~\eqref{eq:schurtrick1} is closely related to {\em Domain Decomposition methods} for solving large linear systems that arise in structural engineering and partial differential equations. In the language of Domain Decomposition, each point in a bundle adjustment problem is a domain, and the cameras form the interface between these domains. The iterative solution of the Schur complement then falls within the sub-category of techniques known as Iterative Sub-structuring~\cite{saad2003iterative,mathew2008domain}.
-
-\section{Preconditioner}
-The convergence rate of Conjugate Gradients for solving~\eqref{eq:normal} depends on the distribution of eigenvalues of $H$~\cite{saad2003iterative}. A useful upper bound is $\sqrt{\kappa(H)}$, where, $\kappa(H)$f is the condition number of the matrix $H$. For most bundle adjustment problems, $\kappa(H)$ is high and a direct application of Conjugate Gradients to~\eqref{eq:normal} results in extremely poor performance.
-
-The solution to this problem is to replace~\eqref{eq:normal} with a {\em preconditioned} system. Given a linear system, $Ax =b$ and a preconditioner $M$ the preconditioned system is given by $M^{-1}Ax = M^{-1}b$. The resulting algorithm is known as Preconditioned Conjugate Gradients algorithm (PCG) and its worst case complexity now depends on the condition number of the {\em preconditioned} matrix $\kappa(M^{-1}A)$.
-
-The computational cost of using a preconditioner $M$ is the cost of computing $M$ and evaluating the product $M^{-1}y$ for arbitrary vectors $y$. Thus, there are two competing factors to consider: How much of $H$'s structure is captured by $M$ so that the condition number $\kappa(HM^{-1})$ is low, and the computational cost of constructing and using $M$. The ideal preconditioner would be one for which $\kappa(M^{-1}A) =1$. $M=A$ achieves this, but it is not a practical choice, as applying this preconditioner would require solving a linear system equivalent to the unpreconditioned problem. It is usually the case that the more information $M$ has about $H$, the more expensive it is use. For example, Incomplete Cholesky factorization based preconditioners have much better convergence behavior than the Jacobi preconditioner, but are also much more expensive.
-
-
-The simplest of all preconditioners is the diagonal or Jacobi preconditioner, \ie, $M=\operatorname{diag}(A)$, which for block structured matrices like $H$ can be generalized to the block Jacobi preconditioner.
-
-For \texttt{ITERATIVE\_SCHUR} there are two obvious choices for block diagonal preconditioners for $S$. The block diagonal of the matrix $B$~\cite{mandel1990block} and the block diagonal $S$, \ie the block Jacobi preconditioner for $S$. Ceres's implements both of these preconditioners and refers to them as \texttt{JACOBI} and \texttt{SCHUR\_JACOBI} respectively.
-
-For bundle adjustment problems arising in reconstruction from community photo collections, more effective preconditioners can be constructed by analyzing and exploiting the camera-point visibility structure of the scene~\cite{kushal2012}. Ceres implements the two visibility based preconditioners described by Kushal \& Agarwal as \texttt{CLUSTER\_JACOBI} and \texttt{CLUSTER\_TRIDIAGONAL}. These are fairly new preconditioners and Ceres' implementation of them is in its early stages and is not as mature as the other preconditioners described above.
-
-\section{Ordering}
-\label{sec:ordering}
-The order in which variables are eliminated in a linear solver can
-have a significant of impact on the efficiency and accuracy of the
-method. For example when doing sparse Cholesky factorization, there are
-matrices for which a good ordering will give a Cholesky factor with
-O(n) storage, where as a bad ordering will result in an completely
-dense factor.
-
-Ceres allows the user to provide varying amounts of hints to the
-solver about the variable elimination ordering to use. This can range
-from no hints, where the solver is free to decide the best ordering
-based on the user's choices like the linear solver being used, to an
-exact order in which the variables should be eliminated, and a variety
-of possibilities in between.
-
-Instances of the \texttt{ParameterBlockOrdering} class are used to communicate this
-information to Ceres.
-
-Formally an ordering is an ordered partitioning of the parameter
-blocks. Each parameter block belongs to exactly one group, and
-each group has a unique integer associated with it, that determines
-its order in the set of groups. We call these groups {\em elimination
- groups}.
-
-Given such an ordering, Ceres ensures that the parameter blocks in the
-lowest numbered elimination group are eliminated first, and then the
-parameter blocks in the next lowest numbered elimination group and so
-on. Within each elimination group, Ceres is free to order the
-parameter blocks as it chooses. e.g. Consider the linear system
-
-\begin{align}
-x + y &= 3\\
- 2x + 3y &= 7
-\end{align}
-
-There are two ways in which it can be solved. First eliminating $x$
-from the two equations, solving for y and then back substituting
-for $x$, or first eliminating $y$, solving for $x$ and back substituting
-for $y$. The user can construct three orderings here.
-
-\begin{enumerate}
-\item $\{0: x\}, \{1: y\}$: Eliminate $x$ first.
-\item $\{0: y\}, \{1: x\}$: Eliminate $y$ first.
-\item $\{0: x, y\}$: Solver gets to decide the elimination order.
-\end{enumerate}
-
-Thus, to have Ceres determine the ordering automatically using
-heuristics, put all the variables in the same elimination group. The
-identity of the group does not matter. This is the same as not
-specifying an ordering at all. To control the ordering for every
-variable, create an elimination group per variable, ordering them in
-the desired order.
-
-If the user is using one of the Schur solvers (\texttt{DENSE\_SCHUR},
-\texttt{SPARSE\_SCHUR},\ \texttt{ITERATIVE\_SCHUR}) and chooses to
-specify an ordering, it must have one important property. The lowest
-numbered elimination group must form an independent set in the graph
-corresponding to the Hessian, or in other words, no two parameter
-blocks in in the first elimination group should co-occur in the same
-residual block. For the best performance, this elimination group
-should be as large as possible. For standard bundle adjustment
-problems, this corresponds to the first elimination group containing
-all the 3d points, and the second containing the all the cameras
-parameter blocks.
-
-If the user leaves the choice to Ceres, then the solver uses an
-approximate maximum independent set algorithm to identify the first
-elimination group~\cite{li2007miqr} .
-\section{\texttt{Solver::Options}}
-
-\texttt{Solver::Options} controls the overall behavior of the
-solver. We list the various settings and their default values below.
-
-\begin{enumerate}
-
-\item{\texttt{trust\_region\_strategy\_type }}
- (\texttt{LEVENBERG\_MARQUARDT}) The trust region step computation
- algorithm used by Ceres. Currently \texttt{LEVENBERG\_MARQUARDT }
- and \texttt{DOGLEG} are the two valid choices.
-
-\item{\texttt{dogleg\_type}} (\texttt{TRADITIONAL\_DOGLEG}) Ceres
- supports two different dogleg strategies.
- \texttt{TRADITIONAL\_DOGLEG} method by Powell and the
- \texttt{SUBSPACE\_DOGLEG} method described by Byrd et al.
-~\cite{byrd1988approximate}. See Section~\ref{sec:dogleg} for more details.
-
-\item{\texttt{use\_nonmonotoic\_steps}} (\texttt{false})
-Relax the requirement that the trust-region algorithm take strictly
-decreasing steps. See Section~\ref{sec:non-monotonic} for more details.
-
-\item{\texttt{max\_consecutive\_nonmonotonic\_steps}} (5)
-The window size used by the step selection algorithm to accept
-non-monotonic steps.
-
-\item{\texttt{max\_num\_iterations }}(\texttt{50}) Maximum number of
- iterations for Levenberg-Marquardt.
-
-\item{\texttt{max\_solver\_time\_in\_seconds }} ($10^9$) Maximum
- amount of time for which the solver should run.
-
-\item{\texttt{num\_threads }}(\texttt{1}) Number of threads used by
- Ceres to evaluate the Jacobian.
-
-\item{\texttt{initial\_trust\_region\_radius } ($10^4$)} The size of
- the initial trust region. When the \texttt{LEVENBERG\_MARQUARDT}
- strategy is used, the reciprocal of this number is the initial
- regularization parameter.
-
-\item{\texttt{max\_trust\_region\_radius } ($10^{16}$)} The trust
- region radius is not allowed to grow beyond this value.
-
-\item{\texttt{min\_trust\_region\_radius } ($10^{-32}$)} The solver
- terminates, when the trust region becomes smaller than this value.
-
-\item{\texttt{min\_relative\_decrease }}($10^{-3}$) Lower threshold
- for relative decrease before a Levenberg-Marquardt step is acceped.
-
-\item{\texttt{lm\_min\_diagonal } ($10^6$)} The
- \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to
- regularize the the trust region step. This is the lower bound on the
- values of this diagonal matrix.
-
-\item{\texttt{lm\_max\_diagonal } ($10^{32}$)} The
- \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to
- regularize the the trust region step. This is the upper bound on the
- values of this diagonal matrix.
-
-\item{\texttt{max\_num\_consecutive\_invalid\_steps } (5)} The step
- returned by a trust region strategy can sometimes be numerically
- invalid, usually because of conditioning issues. Instead of crashing
- or stopping the optimization, the optimizer can go ahead and try
- solving with a smaller trust region/better conditioned problem. This
- parameter sets the number of consecutive retries before the
- minimizer gives up.
-
-\item{\texttt{function\_tolerance }}($10^{-6}$) Solver terminates if
-\begin{align}
-\frac{|\Delta \text{cost}|}{\text{cost}} < \texttt{function\_tolerance}
-\end{align}
-where, $\Delta \text{cost}$ is the change in objective function value
-(up or down) in the current iteration of Levenberg-Marquardt.
-
-\item \texttt{Solver::Options::gradient\_tolerance } Solver terminates if
-\begin{equation}
- \frac{\|g(x)\|_\infty}{\|g(x_0)\|_\infty} < \texttt{gradient\_tolerance}
-\end{equation}
-where $\|\cdot\|_\infty$ refers to the max norm, and $x_0$ is the vector of initial parameter values.
-
-\item{\texttt{parameter\_tolerance }}($10^{-8}$) Solver terminates if
-\begin{equation}
- \frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
-\end{equation}
-where $\Delta x$ is the step computed by the linear solver in the current iteration of Levenberg-Marquardt.
-
-\item{\texttt{linear\_solver\_type }(\texttt{SPARSE\_NORMAL\_CHOLESKY})}
-
-\item{\texttt{linear\_solver\_type
- }}(\texttt{SPARSE\_NORMAL\_CHOLESKY}/\texttt{DENSE\_QR}) Type of
- linear solver used to compute the solution to the linear least
- squares problem in each iteration of the Levenberg-Marquardt
- algorithm. If Ceres is build with \suitesparse linked in then the
- default is \texttt{SPARSE\_NORMAL\_CHOLESKY}, it is
- \texttt{DENSE\_QR} otherwise.
-
-\item{\texttt{preconditioner\_type }}(\texttt{JACOBI}) The
- preconditioner used by the iterative linear solver. The default is
- the block Jacobi preconditioner. Valid values are (in increasing
- order of complexity) \texttt{IDENTITY},\texttt{JACOBI},
- \texttt{SCHUR\_JACOBI}, \texttt{CLUSTER\_JACOBI} and
- \texttt{CLUSTER\_TRIDIAGONAL}.
-
-\item{\texttt{sparse\_linear\_algebra\_library }
- (\texttt{SUITE\_SPARSE})} Ceres supports the use of two sparse
- linear algebra libraries, \texttt{SuiteSparse}, which is enabled by
- setting this parameter to \texttt{SUITE\_SPARSE} and
- \texttt{CXSparse}, which can be selected by setting this parameter
- to $\texttt{CX\_SPARSE}$. \texttt{SuiteSparse} is a sophisticated
- and complex sparse linear algebra library and should be used in
- general. If your needs/platforms prevent you from using
- \texttt{SuiteSparse}, consider using \texttt{CXSparse}, which is a
- much smaller, easier to build library. As can be expected, its
- performance on large problems is not comparable to that of
- \texttt{SuiteSparse}.
-
-
-\item{\texttt{num\_linear\_solver\_threads }}(\texttt{1}) Number of
- threads used by the linear solver.
-
-\item{\texttt{use\_inner\_iterations} (\texttt{false}) } Use a
- non-linear version of a simplified variable projection
- algorithm. Essentially this amounts to doing a further optimization
- on each Newton/Trust region step using a coordinate descent
- algorithm. For more details, see the discussion in ~\ref{sec:inner}
-
-\item{\texttt{inner\_iteration\_ordering} (\texttt{NULL})} If
- \texttt{Solver::Options::inner\_iterations} is true, then the user
- has two choices.
-
-\begin{enumerate}
-\item Let the solver heuristically decide which parameter blocks to
- optimize in each inner iteration. To do this, set
- \texttt{inner\_iteration\_ordering} to {\texttt{NULL}}.
-
-\item Specify a collection of of ordered independent sets. The lower
- numbered groups are optimized before the higher number groups during
- the inner optimization phase. Each group must be an independent set.
-\end{enumerate}
-
-\item{\texttt{linear\_solver\_ordering} (\texttt{NULL})} An instance
- of the ordering object informs the solver about the desired order in
- which parameter blocks should be eliminated by the linear
- solvers. See section~\ref{sec:ordering} for more details.
-
- If \texttt{NULL}, the solver is free to choose an ordering that it
- thinks is best. Note: currently, this option only has an effect on
- the Schur type solvers, support for the
- \texttt{SPARSE\_NORMAL\_CHOLESKY} solver is forth coming.
-
-\item{\texttt{use\_block\_amd } (\texttt{true})} By virtue of the
- modeling layer in Ceres being block oriented, all the matrices used
- by Ceres are also block oriented. When doing sparse direct
- factorization of these matrices, the fill-reducing ordering
- algorithms can either be run on the block or the scalar form of
- these matrices. Running it on the block form exposes more of the
- super-nodal structure of the matrix to the Cholesky factorization
- routines. This leads to substantial gains in factorization
- performance. Setting this parameter to true, enables the use of a
- block oriented Approximate Minimum Degree ordering
- algorithm. Settings it to \texttt{false}, uses a scalar AMD
- algorithm. This option only makes sense when using
- \texttt{sparse\_linear\_algebra\_library = SUITE\_SPARSE} as it uses
- the \texttt{AMD} package that is part of \texttt{SuiteSparse}.
-
-\item{\texttt{linear\_solver\_min\_num\_iterations }}(\texttt{1})
- Minimum number of iterations used by the linear solver. This only
- makes sense when the linear solver is an iterative solver, e.g.,
- \texttt{ITERATIVE\_SCHUR}.
-
-\item{\texttt{linear\_solver\_max\_num\_iterations }}(\texttt{500})
- Minimum number of iterations used by the linear solver. This only
- makes sense when the linear solver is an iterative solver, e.g.,
- \texttt{ITERATIVE\_SCHUR}.
-
-\item{\texttt{eta }} ($10^{-1}$)
- Forcing sequence parameter. The truncated Newton solver uses this
- number to control the relative accuracy with which the Newton step is
- computed. This constant is passed to ConjugateGradientsSolver which
- uses it to terminate the iterations when
-\begin{equation}
-\frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i}
-\end{equation}
-
-\item{\texttt{jacobi\_scaling }}(\texttt{true}) \texttt{true} means
- that the Jacobian is scaled by the norm of its columns before being
- passed to the linear solver. This improves the numerical
- conditioning of the normal equations.
-
-\item{\texttt{logging\_type }}(\texttt{PER\_MINIMIZER\_ITERATION})
-
-
-\item{\texttt{minimizer\_progress\_to\_stdout }}(\texttt{false})
-By default the Minimizer progress is logged to \texttt{STDERR}
-depending on the \texttt{vlog} level. If this flag is
-set to true, and \texttt{logging\_type } is not \texttt{SILENT}, the
-logging output
-is sent to \texttt{STDOUT}.
-
-\item{\texttt{return\_initial\_residuals }}(\texttt{false})
-\item{\texttt{return\_final\_residuals }}(\texttt{false})
-If true, the vectors \texttt{Solver::Summary::initial\_residuals } and
-\texttt{Solver::Summary::final\_residuals } are filled with the
-residuals before and after the optimization. The entries of these
-vectors are in the order in which ResidualBlocks were added to the
-Problem object.
-
-\item{\texttt{return\_initial\_gradient }}(\texttt{false})
-\item{\texttt{return\_final\_gradient }}(\texttt{false})
-If true, the vectors \texttt{Solver::Summary::initial\_gradient } and
-\texttt{Solver::Summary::final\_gradient } are filled with the
-gradient before and after the optimization. The entries of these
-vectors are in the order in which ParameterBlocks were added to the
-Problem object.
-
-Since \texttt{AddResidualBlock } adds ParameterBlocks to the
-\texttt{Problem } automatically if they do not already exist, if you
-wish to have explicit control over the ordering of the vectors, then
-use \texttt{Problem::AddParameterBlock } to explicitly add the
-ParameterBlocks in the order desired.
-
-\item{\texttt{return\_initial\_jacobian }}(\texttt{false})
-\item{\texttt{return\_initial\_jacobian }}(\texttt{false})
-If true, the Jacobian matrices before and after the optimization are
-returned in \texttt{Solver::Summary::initial\_jacobian } and
-\texttt{Solver::Summary::final\_jacobian } respectively.
-
-The rows of these matrices are in the same order in which the
-ResidualBlocks were added to the Problem object. The columns are in
-the same order in which the ParameterBlocks were added to the Problem
-object.
-
-Since \texttt{AddResidualBlock } adds ParameterBlocks to the
-\texttt{Problem } automatically if they do not already exist, if you
-wish to have explicit control over the column ordering of the matrix,
-then use \texttt{Problem::AddParameterBlock } to explicitly add the
-ParameterBlocks in the order desired.
-
-The Jacobian matrices are stored as compressed row sparse
-matrices. Please see \texttt{ceres/crs\_matrix.h } for more details of
-the format.
-
-\item{\texttt{lsqp\_iterations\_to\_dump }} List of iterations at
- which the optimizer should dump the linear least squares problem to
- disk. Useful for testing and benchmarking. If empty (default), no
- problems are dumped.
-
-\item{\texttt{lsqp\_dump\_directory }} (\texttt{/tmp})
- If \texttt{lsqp\_iterations\_to\_dump} is non-empty, then this
- setting determines the directory to which the files containing the
- linear least squares problems are written to.
-
-
-\item{\texttt{lsqp\_dump\_format }}(\texttt{TEXTFILE}) The format in
- which linear least squares problems should be logged
-when \texttt{lsqp\_iterations\_to\_dump} is non-empty. There are three options
-\begin{itemize}
-\item{\texttt{CONSOLE }} prints the linear least squares problem in a human readable format
- to \texttt{stderr}. The Jacobian is printed as a dense matrix. The vectors
- $D$, $x$ and $f$ are printed as dense vectors. This should only be used
- for small problems.
-\item{\texttt{PROTOBUF }}
- Write out the linear least squares problem to the directory
- pointed to by \texttt{lsqp\_dump\_directory} as a protocol
- buffer. \texttt{linear\_least\_squares\_problems.h/cc} contains routines for
- loading these problems. For details on the on disk format used,
- see \texttt{matrix.proto}. The files are named
- \texttt{lm\_iteration\_???.lsqp}. This requires that
- \texttt{protobuf} be linked into Ceres Solver.
-\item{\texttt{TEXTFILE }}
- Write out the linear least squares problem to the directory
- pointed to by \texttt{lsqp\_dump\_directory} as text files
- which can be read into \texttt{MATLAB/Octave}. The Jacobian is dumped as a
- text file containing $(i,j,s)$ triplets, the vectors $D$, $x$ and $f$ are
- dumped as text files containing a list of their values.
-
- A \texttt{MATLAB/Octave} script called \texttt{lm\_iteration\_???.m} is also output,
- which can be used to parse and load the problem into memory.
-\end{itemize}
-
-
-
-\item{\texttt{check\_gradients }}(\texttt{false})
- Check all Jacobians computed by each residual block with finite
- differences. This is expensive since it involves computing the
- derivative by normal means (e.g. user specified, autodiff,
- etc), then also computing it using finite differences. The
- results are compared, and if they differ substantially, details
- are printed to the log.
-
-\item{\texttt{gradient\_check\_relative\_precision }} ($10^{-8}$)
- Relative precision to check for in the gradient checker. If the
- relative difference between an element in a Jacobian exceeds
- this number, then the Jacobian for that cost term is dumped.
-
-\item{\texttt{numeric\_derivative\_relative\_step\_size }} ($10^{-6}$)
- Relative shift used for taking numeric derivatives. For finite
- differencing, each dimension is evaluated at slightly shifted
- values, \eg for forward differences, the numerical derivative is
-
-\begin{align}
- \delta &= \texttt{numeric\_derivative\_relative\_step\_size}\\
- \Delta f &= \frac{f((1 + \delta) x) - f(x)}{\delta x}
-\end{align}
-
-The finite differencing is done along each dimension. The
-reason to use a relative (rather than absolute) step size is
-that this way, numeric differentiation works for functions where
-the arguments are typically large (e.g. $10^9$) and when the
-values are small (e.g. $10^{-5}$). It is possible to construct
-"torture cases" which break this finite difference heuristic,
-but they do not come up often in practice.
-
-\item{\texttt{callbacks }}
- Callbacks that are executed at the end of each iteration of the
-\texttt{Minimizer}. They are executed in the order that they are
-specified in this vector. By default, parameter blocks are
-updated only at the end of the optimization, i.e when the
-\texttt{Minimizer} terminates. This behavior is controlled by
-\texttt{update\_state\_every\_variable}. If the user wishes to have access
-to the update parameter blocks when his/her callbacks are
-executed, then set \texttt{update\_state\_every\_iteration} to true.
-
-The solver does NOT take ownership of these pointers.
-
-\item{\texttt{update\_state\_every\_iteration }}(\texttt{false})
-Normally the parameter blocks are only updated when the solver
-terminates. Setting this to true update them in every iteration. This
-setting is useful when building an interactive application using Ceres
-and using an \texttt{IterationCallback}.
-
-\item{\texttt{solver\_log}} If non-empty, a summary of the execution of the solver is
- recorded to this file. This file is used for recording and Ceres'
- performance. Currently, only the iteration number, total
- time and the objective function value are logged. The format of this
- file is expected to change over time as the performance evaluation
- framework is fleshed out.
-\end{enumerate}
-
-\section{\texttt{Solver::Summary}}
-TBD