Re: Contribution to the optimization toolbox

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Contribution to the optimization toolbox

From:	Jaroslav Hajek
Subject:	Re: Contribution to the optimization toolbox
Date:	Sun, 6 Sep 2009 09:55:02 +0200

On Fri, Sep 4, 2009 at 4:13 PM, Michael Creel<address@hidden> wrote:
> On Fri, Sep 4, 2009 at 2:53 PM, Jaroslav Hajek <address@hidden> wrote:
>>
>> Here is what I get testing your script, using optim-1.0.6 and
>> development octave.
>>
>> ans =
>>
>>   0.131749   0.999000   0.055406   0.970000
>>
>> Since the 5D Rosenbrock function has two local optima, I'm not exactly
>> surprised about the 3% failures for fminunc.
>> I'd rather say it's interesting that bfgsmin is always successful,
>> despite the local optimum.
>
> That's a good point. bfgsmin occasionally does a Hessian reset when it
> runs into trouble, so it sometimes switches to steepest descent. Maybe
> that's the reason.
>

Partially. I discovered the reason why fminunc was so much worse: it
was the factor = 100 setting in fminunc. This was taken from fsolve
and the original MINPACK and means that the starting trust region is
100 times the size of norm(x). While this may be OK for fzero, which
starts with a fresh correct Jacobian, it seems to make little sense
fro fminunc, which starts with a Hessian approximation equal to
identity matrix (and is often quite wrong), so essentially the
function starts from a 100x bigger random guess. I changed the default
values to factor = 1 for fsolve and even more conservative factor =
0.1 for fminunc. With this new setting, I get from your script:

ans =

   0.221018   1.000000   0.077295   1.000000

I also tried dim = 10:
ans =

   0.52277   1.00000   0.21817   1.00000

and dim = 50 (this is lengthy):
ans =

   5.63046   0.47000   3.84247   0.45000

with optimset ("GradObj", "on"), I get for dim = 5:
ans =

   0.221018   0.990000   0.036970   1.000000

and with dim = 10:
ans =

   0.538099   0.990000   0.067742   1.000000

so a lot of time is taken by the finite differences. I'm not sure
bfgsmin uses the analytic gradient at all, though; is there any way to
enforce it?

>>
>> Another interesting thing is that I got the times reversed. Maybe you
>> used a newer version of bfgsmin?
>
> I'm using the lastest development version of bfgsmin, which hasn't had
> changes recently. I don't know when optim-1.0.6 was released, but I
> doubt that there is anything different - certainly there have been no
> major changes. I would guess that the changes in the relative timings
> are more to do with differences between Octave 3.2.3 release
> candidate, and the development version. I'm impressed with the timings
> for fminunc using the development version! The fact that an .m file
> implementation can be so fast is very interesting.
>

Maybe there are also changes to fminunc that didn't yet get to 3.2.x.
I'll check. I'm working with optim-1.0.6; I tried the SVN version but
I can't get it to work: it seems the revision 5818
http://octave.svn.sourceforge.net/viewvc/octave?view=rev&revision=5818
broke things for me; I tried to fix it (including a screwed-up commit
where my debug lines got in), but I wasn't successful yet. I think
this is the only change since 1.0.6.

> Great. But to facilitate testing and to avoid having redundant
> algorithms, it would be nice to have a set of test problems to work
> with. For example, this little test shows that fminunc is getting a
> lot faster with the development version of Octave, and could probably
> be made a little more robust with a little work. I'll have a look at
> the fminunc code and see if I can make some suggestions. If it gets as
> robust as bfgsmin, and is faster, then bfgsmin will probably need to
> be retired.

Well I'm not sure about that. The point is that bfgsmin also contains
the LBFGS update. fminunc uses a factorized direct BFGS update (i.e.
updating the Cholesky factor of the hessian, via cholupdate), so one
step of it is O(N^2) and the memory is also O(N^2). This is equal to
the direct updating of inverse employed by bfgsmin; a possible
advantage to fminunc is that all the hidden intelligence of Octave's
matrix division is employed, including check for ill-conditioning.
However, for large systems, LBFGS is bound to win; it's asymptotically
superior.
The optimization arsenal provided by Octave & OctaveForge does not
need to be orthogonal, just broad enough.

On the whole, I'm now convinced that optimization algorithms are well
suited to be implemented as m-files. Very often in real applications,
the overhead of the algorithm itself is actually negligible, so one
doesn't need to worry too much about performance. And m-files have the
advantage that more people are able to play around with them, tweak
and improve them, and share the improvements.
Another advantage of m-files is their genericity; for instance,
fminunc will compute in single precision if given single precision
inputs. That may not be a big advantage, as single precision
optimization is rare, but the general mechanism is the point. For
instance, it took just a few small changes to make fsolve able to
solve complex (holomorphic) nonlinear systems, using complex Newton
method.
On the other hand, it's essential to have good building blocks for the
m-files. The choice between high-level code and building block,
between m-file and .oct file, is an art.

regards

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Contribution to the optimization toolbox, (continued)

Prev by Date: fltk and fonts
Next by Date: Re: fltk
Previous by thread: Re: Contribution to the optimization toolbox
Next by thread: Re: Contribution to the optimization toolbox
Index(es):
- Date
- Thread