octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [OctDev] An improvement to ranks.m:


From: Carnë Draug
Subject: Re: [OctDev] An improvement to ranks.m:
Date: Wed, 2 May 2012 16:22:56 +0100

On 2 May 2012 05:47, Dave Goel <address@hidden> wrote:
> Hi.
>
> I believe that ranks.m (from
> /usr/share/octave/3.2.4/m/statistics/base/ranks.m in Debian) is very
> inefficient except when the elements of the input matrix are distinct.
>
> For example,  for this input, a=round(10*rand(100123,100));
>
> my alternative finished computation in 1.4s while the current function
> hasn't yet returned an answer in the last 15-20 minutes.
>
> For smaller inputs, I checked carefully, and the outputs do agree.
>
>
> ---
> Additionally, the alternative also implements multiple ranking schemes:
> fractional (default), standard competition ranking, modified competition
> ranking, and finally, ordinal ranking; examples of all of which are seen
> here: http://en.wikipedia.org/wiki/Ranking
>
>
> ----
>
> Though I completely changed the algorithm, I managed to pen the
> alternative as a derivative of the current function. Thus, attaching (a)
> diff; (b) as the complete file.
>
> Both the diff and the complete file are also included right here within
> the body of this email.
> ----
>
> --- /usr/share/octave/3.2.4/m/statistics/base/ranks.m   2010-10-25 
> 16:57:01.000000000 -0400
> +++ ./ranksoctavemy.m   2012-05-02 00:26:34.000000000 -0400
> @@ -1,45 +1,58 @@
>  ## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006,
> -##               2007, 2008, 2009 Kurt Hornik
> +## 2007, 2008, 2009 Kurt Hornik
> +## Copyright (C) 2012 Dave Goel
>  ##
>  ## This file is part of Octave.
>  ##
>  ## Octave is free software; you can redistribute it and/or modify it
> -## under the terms of the GNU General Public License as published by
> -## the Free Software Foundation; either version 3 of the License, or (at
> +## under the terms of the GNU General Public License as published by the
> +## Free Software Foundation; either version 3 of the License, or (at
>  ## your option) any later version.
>  ##
> -## Octave is distributed in the hope that it will be useful, but
> -## WITHOUT ANY WARRANTY; without even the implied warranty of
> -## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -## General Public License for more details.
> +## Octave is distributed in the hope that it will be useful, but WITHOUT
> +## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +## FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +## for more details.
>  ##
>  ## You should have received a copy of the GNU General Public License
>  ## along with Octave; see the file COPYING.  If not, see
>  ## <http://www.gnu.org/licenses/>.
> -
> +##
>  ## -*- texinfo -*-
> -## @deftypefn {Function File} {} ranks (@var{x}, @var{dim})
> -## Return the ranks of @var{x} along the first non-singleton dimension
> -## adjust for ties.  If the optional argument @var{dim} is
> -## given, operate along this dimension.
> -## @end deftypefn
> +##
> +## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
> +## ranks of @var{x} along the first non-singleton dimension adjust for
> +## ties. If the optional argument @var{dim} is given, operate along this
> +## dimension. Third argument @var{rtype} donates the type of ranking: 0
> +## or "fractional" for fractional, which is the default, 1 or
> +## "competition" for competiton ranking, 2 or "modified" for modified
> +## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
> +## "dense" for dense ranking. When using the third argument, you can
> +## supply us '' for @var{dim} to have it auto-computed. @end deftypefn
> +
> +## This function should return a result identical to GNU OCTAVE's
> +## built-in rank function but is much faster (2.2s vs. 50s when the
> +## input was, for example, 10*rand(100123,100)).
> +## Additionally, we also handle several ranking schemes.
>
> -## Author: KH <address@hidden>
> +## Authors: KH <address@hidden>, Dave Goel <address@hidden
>  ## Description: Compute ranks
>
> -## This code was rather ugly, since it didn't use sort due to the
> -## fact of how to deal with ties. Now it does use sort and its
> -## even uglier!!! At least it handles NDArrays..
>
> -function y = ranks (x, dim)
> +function y = ranksoctavemy (x, dim, rtype)
>
> -  if (nargin != 1 && nargin != 2)
> +  if (nargin < 1);
>     print_usage ();
>   endif
>
> +  if nargin<3||isempty(rtype);
> +    rtype=0;
> +  endif
> +
>   nd = ndims (x);
>   sz = size (x);
> -  if (nargin != 2)
> +  autodimp=(nargin<2)||isempty(dim);
> +  if autodimp;
>     ## Find the first non-singleton dimension.
>     dim  = 1;
>     while (dim < nd + 1 && sz(dim) == 1)
> @@ -66,24 +79,69 @@
>       perm(dim) = 1;
>       x = permute (x, perm);
>     endif
> -    sz = size (x);
> -    infvec = -Inf * ones ([1, sz(2 : end)]);
> -    [xs, xi] = sort (x);
> -    eq_el = find (diff ([xs; infvec]) == 0);
> -    if (isempty (eq_el))
> -      [eq_el, y] = sort (xi);
> -    else
> -      runs = complement (eq_el+1, eq_el);
> -      len = diff (find (diff ([Inf; eq_el; -Inf]) != 1)) + 1;
> -      [eq_el, y] = sort (xi);
> -      for i = 1 : length(runs)
> -       y (xi (runs (i) + [0:(len(i)-1)]) + floor (runs (i) ./ sz(1))
> -          * sz(1)) = eq_el(runs(i)) + (len(i) - 1) / 2;
> -      endfor
> -    endif
> +    sz=size(x);
> +    [sx ids]=sort(x); ## sx is sorted x.
> +
> +    lin=cumsum(ones(size(x)),1);  ## A linearly increasing array.
> +
> +    switch rtype;
> +      case {0,"fractional"};
> +        lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
> +      case {1,"competition"};
> +        lin=_competition(lin,sx,sz);
> +      case {2,"modified"};
> +        lin=_modified(lin,sx,sz);
> +      case {3,"ordinal"};
> +        ## lin=lin;
> +      otherwise
> +        rtype
> +        error("Illegal value of rtype specified.");
> +    endswitch
> +    y=NaN(size(lin));
> +    ## If input was a vector, we could have simply done this:
> +    ## y(ids)=lin;
> +    y(_ids(ids,sz))=lin;
> +
>     if (dim != 1)
>       y = permute (y, perm);
>     endif
>   endif
> +endfunction
>
> +function idf=_ids(ids,sz);
> +  oo=ones(sz);
> +  allids=[{ids-1}];
> +  nn=numel(sz);
> +  for ii=2:nn;
> +    allids=[allids,{cumsum(oo,ii)-1}];
> +  endfor
> +  idf=allids{end};
> +  for jj=(nn-1):-1:1;
> +    idf=idf*sz(jj)+allids{jj};
> +  endfor
> +  idf+=1;
>  endfunction
> +
> +
> +function linnew=_competition (lin,sx,sz)
> +  ## Stop increasing lin when sx does not increase. Same as before
> +  ## otherwise.
> +  infvec = -Inf * ones ([1, sz(2 : end)]);
> +  fnewp= find(diff([infvec;sx]));
> +  linnew=zeros(size(lin));
> +  linnew(fnewp)=lin(fnewp);
> +  linnew=cummax(linnew);
> +endfunction
> +
> +
> +function linnew=_modified (lin,sx,sz)
> +  ## Traverse lin backwards. Stop decreasing it when sx doesn't
> +  ## decrease.
> +  infvec = Inf * ones ([1, sz(2 : end)]);
> +  fnewp= find(diff([sx;infvec]));
> +  linnew=Inf(size(lin));
> +  linnew(fnewp)=lin(fnewp);
> +  linnew=flipdim(cummin(flipdim(linnew,1)),1);
> +endfunction
> +
> +
>
>
>  ====================================================
>
> ## Copyright (C) 1995, 1996, 1997, 1998, 2000, 2002, 2004, 2005, 2006,
> ## 2007, 2008, 2009 Kurt Hornik
> ## Copyright (C) 2012 Dave Goel
> ##
> ## This file is part of Octave.
> ##
> ## Octave is free software; you can redistribute it and/or modify it
> ## under the terms of the GNU General Public License as published by the
> ## Free Software Foundation; either version 3 of the License, or (at
> ## your option) any later version.
> ##
> ## Octave is distributed in the hope that it will be useful, but WITHOUT
> ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> ## FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> ## for more details.
> ##
> ## You should have received a copy of the GNU General Public License
> ## along with Octave; see the file COPYING.  If not, see
> ## <http://www.gnu.org/licenses/>.
> ##
> ## -*- texinfo -*-
> ##
> ## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
> ## ranks of @var{x} along the first non-singleton dimension adjust for
> ## ties. If the optional argument @var{dim} is given, operate along this
> ## dimension. Third argument @var{rtype} donates the type of ranking: 0
> ## or "fractional" for fractional, which is the default, 1 or
> ## "competition" for competiton ranking, 2 or "modified" for modified
> ## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
> ## "dense" for dense ranking. When using the third argument, you can
> ## supply us '' for @var{dim} to have it auto-computed. @end deftypefn
>
> ## This function should return a result identical to GNU OCTAVE's
> ## built-in rank function but is much faster (2.2s vs. 50s when the
> ## input was, for example, 10*rand(100123,100)).
> ## Additionally, we also handle several ranking schemes.
>
> ## Authors: KH <address@hidden>, Dave Goel <address@hidden
> ## Description: Compute ranks
>
>
> function y = ranksoctavemy (x, dim, rtype)
>
>  if (nargin < 1);
>    print_usage ();
>  endif
>
>  if nargin<3||isempty(rtype);
>    rtype=0;
>  endif
>
>  nd = ndims (x);
>  sz = size (x);
>  autodimp=(nargin<2)||isempty(dim);
>  if autodimp;
>    ## Find the first non-singleton dimension.
>    dim  = 1;
>    while (dim < nd + 1 && sz(dim) == 1)
>      dim = dim + 1;
>    endwhile
>    if (dim > nd)
>      dim = 1;
>    endif
>  else
>    if (! (isscalar (dim) && dim == round (dim))
>        && dim > 0
>        && dim < (nd + 1))
>      error ("ranks: dim must be an integer and valid dimension");
>    endif
>  endif
>
>  if (sz(dim) == 1)
>    y = ones(sz);
>  else
>    ## The algorithm works only on dim = 1, so permute if necesary.
>    if (dim != 1)
>      perm = [1 : nd];
>      perm(1) = dim;
>      perm(dim) = 1;
>      x = permute (x, perm);
>    endif
>    sz=size(x);
>    [sx ids]=sort(x); ## sx is sorted x.
>
>    lin=cumsum(ones(size(x)),1);  ## A linearly increasing array.
>
>    switch rtype;
>      case {0,"fractional"};
>        lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
>      case {1,"competition"};
>        lin=_competition(lin,sx,sz);
>      case {2,"modified"};
>        lin=_modified(lin,sx,sz);
>      case {3,"ordinal"};
>        ## lin=lin;
>      otherwise
>        rtype
>        error("Illegal value of rtype specified.");
>    endswitch
>    y=NaN(size(lin));
>    ## If input was a vector, we could have simply done this:
>    ## y(ids)=lin;
>    y(_ids(ids,sz))=lin;
>
>    if (dim != 1)
>      y = permute (y, perm);
>    endif
>  endif
> endfunction
>
> function idf=_ids(ids,sz);
>  oo=ones(sz);
>  allids=[{ids-1}];
>  nn=numel(sz);
>  for ii=2:nn;
>    allids=[allids,{cumsum(oo,ii)-1}];
>  endfor
>  idf=allids{end};
>  for jj=(nn-1):-1:1;
>    idf=idf*sz(jj)+allids{jj};
>  endfor
>  idf+=1;
> endfunction
>
>
> function linnew=_competition (lin,sx,sz)
>  ## Stop increasing lin when sx does not increase. Same as before
>  ## otherwise.
>  infvec = -Inf * ones ([1, sz(2 : end)]);
>  fnewp= find(diff([infvec;sx]));
>  linnew=zeros(size(lin));
>  linnew(fnewp)=lin(fnewp);
>  linnew=cummax(linnew);
> endfunction
>
>
> function linnew=_modified (lin,sx,sz)
>  ## Traverse lin backwards. Stop decreasing it when sx doesn't
>  ## decrease.
>  infvec = Inf * ones ([1, sz(2 : end)]);
>  fnewp= find(diff([sx;infvec]));
>  linnew=Inf(size(lin));
>  linnew(fnewp)=lin(fnewp);
>  linnew=flipdim(cummin(flipdim(linnew,1)),1);
> endfunction
> ---
>
> Sincerely
> Dave Goel.

Hi Dave

I'm forwarding your e-mail to the octave mailing list (this is the
mailing list for octave forge packages, but ranks is part of octave
core). Also, the best place to submit this would be the patch tracker
on savannah http://savannah.gnu.org/patch/?group=octave

One last note, the diff you have is against the code on 3.2.4 which is
a quite old version. Your patch won't apply since the file has gone
through some modifications already (see the log of the file here
http://hg.savannah.gnu.org/hgweb/octave/log/c38a253723d3/scripts/statistics/base/ranks.m
).

Carnë


reply via email to

[Prev in Thread] Current Thread [Next in Thread]