[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [really patch] Re: HashMap putAll/putAllInternal bug
From: |
Stuart Ballard |
Subject: |
Re: [really patch] Re: HashMap putAll/putAllInternal bug |
Date: |
Mon, 29 Sep 2003 10:57:40 -0400 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030527 Debian/1.3.1-2 |
Bryce McKinlay wrote:
size() is used here because, obviously, it is generally more efficient
to call it once rather than calling hasNext() many times. I believe that
the current implementation is within spec according to the collections
documentation. If your collections are returning an inaccurate size()
then I'd argue they are not valid implementations of Map.
Sure: as I noted, my argument is that Sun's implementation can handle
such invalid implementations of Map, so people might rely on it, as I did.
I could fix up my implementation of Map to guarantee that size() is
correct, but that would make this operation much *slower* than using
hasNext() would. To get an accurate size() out of my data structure
would require iterating across it fully every time size() is called.
Furthermore, I'd argue that the very existence of a hasNext() method
suggests that Sun didn't intend people to make this "optimization". If
they expected that calling size() and maintaining your own counter would
always be more efficient, why didn't they just leave hasNext() out and
recommend coding that way?
Note too that our current implementations leave the collections in an
invalid state if next() causes a ConcurrentModificationException (or
other RuntimeException) part way through, because they set the size
variable in advance without waiting to ensure that all those elements
could in fact be added. It would be possible to fix this problem without
using hasNext(), and even if my patch isn't accepted I still think we
should at least fix that.
Of course, if there are real applications out there that rely on the way
Sun implements it, then we may have to change to using hasNext(). But we
should consider this carefully. If we must change it, then the
addAll/putAll implementations should change throughout the collections
classes for consistency - not just HashMap/Hashtable. As you noticed, in
some cases this could make things significantly less efficient.
Firstly, there is at least one real application that relies on this :)
If I could see a viable workaround, I'd modify nrdo to provide accurate
size() information so that Classpath wouldn't need to change (although I
still think that it's important to be bug-for-bug compatible with Sun's
implementation, so I'd still be in favor of this change even if I wasn't
personally relying on it). But in the case of my particular data
structure it's impossible to do that without slowing everything down,
and that isn't acceptable for my application.
I think it's perfectly possible to implement all our collections to give
the correct behavior without making things less efficient in the case
where size() is correct - at least, the only difference should be the
cost of repeated calls to hasNext() versus a single call to size(), and
the cost of hasNext() should *always* be small enough that that
difference falls into the realm of micro-optimization.
The example I gave was ArrayList: the optimization is to pre-allocate
space for size() worth of elements using ensureCapacity(), and then copy
elements directly into the backing array. It would be perfectly possible
to implement it something like this, which optimizes for when size() is
correct but is robust if it's not:
int csize = c.size();
ensureCapacity(size() + csize);
for (Iterator i = c.iterator(); csize-- > 0 && i.hasNext(); ) {
// put i.next() directly into the array and increment size,
// exactly as it's done now.
}
while (i.hasNext()) {
add(i.next()); // use the standard add method which will
// grow the array further if needed.
}
In this version, the for() loop checks i.hasNext() as well as csize so
that it's robust against the actual size being smaller than csize. The
while() loop uses a slower path for extra elements in case the actual
size is larger than csize(). And the only additional cost of this
algorithm is extra calls to hasNext(), which should always be cheap.
I don't think there are any ways where size() could be used as an
optimization that aren't susceptible to a similar approach - optimize
for the case where size() is correct, but still be prepared in case it's
not.
Stuart.
--
Stuart Ballard, Senior Web Developer
FASTNET - Web Solutions
(215) 283-2300, ext. 126
www.fast.net
[really patch] Re: HashMap putAll/putAllInternal bug, Stuart Ballard, 2003/09/25