octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Modifications to hist.m


From: Paul Kienzle
Subject: Re: Modifications to hist.m
Date: Sat, 15 Mar 2003 21:43:01 -0500
User-agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.3a) Gecko/20021212

Andy Adler wrote:

I propose the following patch to hist.m;
it results in about 2.5x speedup.

At one point a rewrote hist to not have any
loops.  I was doing a whole lot of really large
histograms (e.g., 100000 values drawn from
a poisson distribution --- I had to rewrite the
poisson generator too ;-)

Please check it out and tell me if it is any faster
than what you've got.  I think it might be slower
if you have a histogram with a lot of empty bins.

Thanks,

Paul Kienzle
address@hidden

## Copyright (C) 1996, 1997 John W. Eaton
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 2, or (at your option)
## any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING.  If not, write to the Free
## Software Foundation, 59 Temple Place - Suite 330, Boston, MA
## 02111-1307, USA.

## -*- texinfo -*-
## @deftypefn {Function File} {} hist (@var{y}, @var{x}, @var{norm})
## Produce histogram counts or plots.
##
## With one vector input argument, plot a histogram of the values with
## 10 bins.  The range of the histogram bins is determined by the range
## of the data.
##
## Given a second scalar argument, use that as the number of bins.
##
## Given a second vector argument, use that as the centers of the bins,
## with the width of the bins determined from the adjacent values in
## the vector.
##
## If third argument is provided, the histogram is normalised such that
## the sum of the bars is equal to @var{norm}.
##
## Extreme values are lumped in the first and last bins.
##
## With two output arguments, produce the values @var{nn} and @var{xx} such
## that @code{bar (@var{xx}, @var{nn})} will plot the histogram.
## @end deftypefn
## @seealso{bar}

## Author: jwe

function [nn, xx] = hist (y, x, norm)

  if (nargin < 1 || nargin > 3)
    usage ("[nn, xx] = hist (y, x, norm)");
  endif

  if (is_vector (y))
    max_val = max (y);
    min_val = min (y);
  else
    error ("hist: first argument must be a vector");
  endif

  if (nargin == 1)
    x = 10;
  endif

  if (is_scalar (x))
    n = x;
    if (n <= 0 || n != fix(n))
      error ("hist: number of bins must be a positive integer");
    endif
    delta = (max_val - min_val) / n / 2;
    x = linspace (min_val+delta, max_val-delta, n);

    freq = zeros(1,n);
    q = sort(y(:).');
    L = length(q);
    if (q(1) == q(L))
      freq(n) = L;
    else
      q = (q - q(1))/(q(L)-q(1))/(1+eps); # set y-range to [0,1)
      q = fix(q*n);                  # split into n bins
      same = ( q == [q(2:L),-Inf] ); # true if neighbours are in the same bin
      q = q(~same);                  # q lists the 'active' bins (0-origin)
      
      f = cumsum(same);              # cumulative histogram
      f = f(~same);
      f = [f(1), diff(f)] + 1;       # cumulative histogram -> histogram
      ## we need to add 1 since we did not count the point at the 
      ## boundary between bins (it was turned into zero)
      
      ## distribute f to the active bins, leaving the remaining bins empty
      freq(q+1) = f;
    endif
  elseif (is_vector (x))
    tmp = sort (x);
    if (any (tmp != x))
      warning ("hist: bin values not sorted on input");
      x = tmp;
    endif
    n = length(x);
    cutoff = ( x(1:n-1) + x(2:n) ) / 2;    # find bin boundaries
    [s, idx] = sort ( [cutoff(:); y(:)] ); # put elements between boundaries
    chist = cumsum(idx>n);                 # integrate over all elements
    chist = [chist(idx<n); chist(length(chist))];  # keep totals at boundaries
    freq = [chist(1); diff(chist) ];       # differentiate for histogram
  else
    error ("hist: second argument must be a scalar or a vector");
  endif

  if (nargin == 3)
    ## Normalise the histogram.
    freq = freq / length(y) * norm;
  endif

  if (nargout > 0)
    nn = freq;
    xx = x;
  else
    bar (x, freq);
  endif

endfunction

reply via email to

[Prev in Thread] Current Thread [Next in Thread]