help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading portions of large files


From: Lee Sau Dan
Subject: Re: Reading portions of large files
Date: 20 Jan 2003 08:50:31 +0100
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7

>>>>> "Stefan" == "Stefan Monnier <foo@acm.com>" 
>>>>> <monnier+gnu.emacs.help/news/@flint.cs.yale.edu> writes:

    Stefan> Since at least 1 bit of tag is needed, that means that to
    Stefan> get 31bit integers we'd need to move the mark bit
    Stefan> somewhere else.  XEmacs decided to use 3-word cons cells
    Stefan> (and I know they're still regularly wondering whether it
    Stefan> was a good idea).  Another approach is to use a separate
    Stefan> mark-bit array.

I think the separate mark-bit  array would be cleaner.  You don't need
to access  the mark  bits unless  you're doing gc.   Why let  that bit
stick  there in  the  _main_ working  set  all the  time?  Wouldn't  a
separate mark-bit array also improve locality (important for caching)?

Then, in theory, the tag bits  can also be kept separately, giving the
full  32 bits to  integers (represented  as machine-native  words).  I
think  we only  need 1  tag bit  in the  separate tag-bit  array.  Its
function is  to indicate whether  the corresponding memory word  is an
integer or not.  If not, then  the remaining tag bits are found in the
word itself.  And integer arithmetic can certainly be faster!

Would this implementation be more efficient or worse?


    Stefan> Lots of trade offs, a fair bit of coding, even more
    Stefan> testing, ...  Anybody interested is welcome to tried it
    Stefan> out.  My opinion is that maybe it would be nice, but since
    Stefan> the only application I'm aware of is "editing files
    Stefan> between 128MB and 1GB on 32bit systems", I don't think
    Stefan> it's worth the trouble.

Yeah.  I share this last point with you.  >128MB text files are simply
weird.  And for binary file, a real hex editor (or 'xxd', which I just
discovered) is a more appropriate tool, or just 'dd'.


-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee


reply via email to

[Prev in Thread] Current Thread [Next in Thread]