## Pseudogradient (Quasi-Newton) optimization techniques

Download Optimization.NET (.NET 4.5.1)

Both **David-Fletcher-Powell (DFP)** and **BCFG** algorithms belong to the group of the Pseudogradient (aka Quasi Newton)optimization algorithms. Pseudogradient methods are derived from gradient - based optimizations. These methods begin the search along a gradient line and use gradient information to build a quadratic fit to the model function.
All of the algorithms involve a line search given by the following equation:

The line search in this context means finding a minimum of a one dimensional function defined by the gradient direction. For gradient search

x_{k+1}=x_{k}- aH_{k}Ñy(x_{k}) (1)

**H**

_{k}is

**I**, the unit matrix, and a is the parameter of the gradient line. For Newton's method

**H**

_{k}is the inverse of the Hessian matrix,

**H**

^{-1}; and a.

The Newton method works very well for
the quadratic functions however, for nonlinear functions generally,
Newton's method moves methodically
toward the optimum, but the computational effort required to compute the
inverse of the Hessian matrix at each iteration usually is excessive compared
to other methods. Consequently, it is considered to be an inefficient
middle game procedure for most problems.
is one. For quasi-Newton methods **H**_{k} is a series
of matrices beginning with the unit matrix, **I**, and ending
with the inverse of the Hessian matrix, **H**^{-1}.

**x**):

x_{k+1}=x_{k}- a_{k+1}H_{k}Ñy(x_{k}) (2)

where
a_{k+1}
is the parameter of the line through **x**_{k}, and
**H**_{k} is given by the following equation:

H_{k}=H_{k-1}+A_{k}+B_{k}(3)

The matrices **A**_{k}
and **B**_{k} are given by the following equations.

(4) (5)

The algorithm
begins with a search along the gradient line from the starting point **x**_{0}
as given by the following equation obtained from equation (6-13), with
k = 0.

x_{1}=x_{0}- a_{1}H_{0}Ñy(x_{0}) (6)

where **H**_{0}
= **I** is the unit matrix. The algorithm continues using equation
(2) until a stopping criterion is met. However, for a quadratic function
with *n* independent variables the method converges to the optimum
after *n* iterations (quadratic termination) if exact line searches
are used.

The matrices
**A**_{k} and **B**_{k} have
been constructed so their sums would have the specific properties shown
below.

(7) (8)

The sum
of the *n* matrices **A**_{k} generate the
inverse of the Hessian matrix **H**^{-1} to have
equation (3) be the same as Newton's method, equation (1), at the
end of *n* iterations. The sum of the matrices **B**_{k}
generates the negative of the unit matrix **I** at the end
of n iterations to cancel the first step of the algorithm when **I**
was used for **H**_{0} in equation (6).

A further refinement of the DFP algorithm is the BFGS method which has been proven to have better convergence abilitied. The BFGS matrix up-date formula is:

(9)

where d_{k} = **x**_{k+1}
- **x**_{k} and g_{k}
= Ñy(**x**_{k+1})
- Ñy(**x**_{k}).

This equation
is used in place of equation (3) in the algorithm given by equation
(2). The procedure is the same in that a search along the gradient
line from starting point **x**_{0} is conducted initially
according to equation (6). Then the Hessian matrix is updated using
equation (9), and for quadratic functions the method arrives at the
minimum after *n* iterations.