## Pseudogradient (Quasi-Newton) optimization techniques

Download Optimization.NET (.NET 4.5.1)

Both **David-Fletcher-Powell (DFP)** and **BCFG** algorithms belong to the group of the Pseudogradient (aka Quasi Newton)optimization algorithms. Pseudogradient methods are derived from gradient - based optimizations. These methods begin the search along a gradient line and use gradient information to build a quadratic fit to the model function. All of the algorithms involve a line search given by the following equation:

x_{k+1}=x_{k}- aH_{k}Ñy(x_{k}) (1)

The line search in this context means finding a minimum of a one dimensional function defined by the gradient direction. For gradient search **H**_{k} is **I**, the unit matrix, and a is the parameter of the gradient line. For Newton's method **H**_{k} is the inverse of the Hessian matrix, **H**^{-1}; and a.

The Newton method works very well for the quadratic functions however, for nonlinear functions generally, Newton's method moves methodically toward the optimum, but the computational effort required to compute the inverse of the Hessian matrix at each iteration usually is excessive compared to other methods. Consequently, it is considered to be an inefficient middle game procedure for most problems. is one. For quasi-Newton methods **H**_{k} is a series of matrices beginning with the unit matrix, **I**, and ending with the inverse of the Hessian matrix, **H**^{-1}.

The DFP algorithm has the following form of equation (1) for minimizing the function y(**x**):

x_{k+1}=x_{k}- a_{k+1}H_{k}Ñy(x_{k}) (2)

where a_{k+1} is the parameter of the line through **x**_{k}, and **H**_{k} is given by the following equation:

H_{k}=H_{k-1}+A_{k}+B_{k}(3)

The matrices **A**_{k} and **B**_{k} are given by the following equations.

(4) (5)

The algorithm begins with a search along the gradient line from the starting point **x**_{0} as given by the following equation obtained from equation (6-13), with k = 0.

x_{1}=x_{0}- a_{1}H_{0}Ñy(x_{0}) (6)

where **H**_{0} = **I** is the unit matrix. The algorithm continues using equation (2) until a stopping criterion is met. However, for a quadratic function with *n* independent variables the method converges to the optimum after *n* iterations (quadratic termination) if exact line searches are used.

The matrices **A**_{k} and **B**_{k} have been constructed so their sums would have the specific properties shown below.

(7) (8)

The sum of the *n* matrices **A**_{k} generate the inverse of the Hessian matrix **H**^{-1} to have equation (3) be the same as Newton's method, equation (1), at the end of *n* iterations. The sum of the matrices **B**_{k} generates the negative of the unit matrix **I** at the end of n iterations to cancel the first step of the algorithm when **I** was used for **H**_{0} in equation (6).

A further refinement of the DFP algorithm is the BFGS method which has been proven to have better convergence abilitied. The BFGS matrix up-date formula is:

(9)

where d_{k} = **x**_{k+1} - **x**_{k} and g_{k} = Ñy(**x**_{k+1}) - Ñy(**x**_{k}).

This equation is used in place of equation (3) in the algorithm given by equation (2). The procedure is the same in that a search along the gradient line from starting point **x**_{0} is conducted initially according to equation (6). Then the Hessian matrix is updated using equation (9), and for quadratic functions the method arrives at the minimum after *n* iterations.