octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FYI: subasgn argument optimization


From: Jaroslav Hajek
Subject: Re: FYI: subasgn argument optimization
Date: Fri, 14 Aug 2009 18:30:07 +0200

On Fri, Aug 14, 2009 at 5:55 PM, John W. Eaton<address@hidden> wrote:
> On 13-Aug-2009, Jaroslav Hajek wrote:
>
> | 2. Since only one physical copy of the object always exists, an error
> | inside subsasgn.m does not revoke changes that have already been made.
> | It is solely the callee's responsibility not to leave the object in an
> | invalid state. This may possibly alter the way code works (though well
> | written code should probably work well). The internal variable
> | optimize_subsasgn_calls is provided to switch off the optimization, if
> | it causes problems. Another option is to declare different output and
> | input arguments in the method.
>
> This seems a bit problematic, since users have reason to expect that
> an error during assignment doesn't alter existing data.
>
> I'm a bit surprised that the situation is as bad as you describe, at
> least for things like structs or cell arrays, which contain arrays of
> octave_value objects, which are themselves reference counted objects.
> So maybe we could do better and still preserve the current semantics
> with regard to errors?
>
> When you do something like
>
>  s1 = struct (...);
>
> you create a map with a collection of octave_value objects.  If
> multiple copies of s1 are made, then the reference count for the
> underlying octave_struct object is incremented.  Unsharing should
> generate a separate copy the octave_struct object, but when copying
> the elements in the map, we should only have to increment the count
> for those objects.  What I would expect to happen is that we would
> only copy the actual data for those objects if they were actually
> changed.  If that's not happening and we are actually copying
> everything at the first point of unsharing, then maybe that is
> something that should be fixed?

Yes, it is what is happening - I inadvertently used the word "copy"
for both shallow copy and physical (deep) copy.
The problem is with large arrays as class member fields (for instance,
imagine a triangular matrix class). Suppose a struct x has a big
matrix field a = zeros(1000), that has no shallow copies. Then, the
statement
x.a(1,1) = 1;
directly modifies the stored array (although it did copy in 3.0.x).
If, however, x is a class, then what happens is this
1. x is shallow-copied and passed to the subsasgn method (say, as xx).
2. subsasgn method invokes the assignment xx.a(1,1) = 1;
2a. xx is unshared; xx.a is a shallow copy of x.a
2b. xx.a is unshared to complete the assignment, this causes a
physical copy of a's data.
3. at this point, x.a and xx.a are almost equal arrays; differing only
in the element 1,1.
4. the subsasgn method finishes and control is given back to caller
5. xx is assigned to x. x.a is released and replaced by xx.a.
6. xx is cleared.

the point is, that the physical copy created at point 2b is almost
always unnecessary, because the value of x.a will be replaced as soon
as the method finishes. unless an error occurs.

there is no way to both have this optimization and preserve the
behavior w.r.t. errors - this is easy to see. suppose the subsasgn
method first changes 999999 elements of the matrix, and then an error
occurs when changing the last element. there is no way to restore back
the prior values of the 999999 elements, because there is no copy.

so, of course, when working this way, the subsasgn method should be
coded using the paradigm
"first check everything, then assign" so that you never generate an
error *after* you have already overwritten something. this is exactly
what the nested assignment code inside ov-cell and ov-struct now does.

I believe that well written code already conforms to this paradigm, or
is very close. But of course, a flag should be provided to possibly
disable it - or it could be off by default, that can still be decided.
Another option is to declare the method with different input/output
variable; this is what one usually does when one wants to use a
copy-and-change approach rather than change-in-place.



-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]