Re: speed of octave symbol table code

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speed of octave symbol table code

From:	John W. Eaton
Subject:	Re: speed of octave symbol table code
Date:	Tue, 23 Oct 2007 00:39:12 -0400

On 22-Oct-2007, John W. Eaton wrote:

| On 23-Oct-2007, David Bateman wrote:
| 
| | I think changing symtab.h:502 in the following manner for 3.0 will be a win.
| | 
| | -  symbol_table (unsigned int tab_s = 128,
| | +  symbol_table (unsigned int tab_size = 32,
| 
| I thought about it some more and I think we should do a simple test
| before making this change.  The thing I want to check is whether the
| smaller table size results in significantly more hash collisions that
| result in longer chains in the table.  If so, then although you are
| speeding up some symbol table operations, you could be slowing symbol
| lookups, which will affect the speed of evaluating the function
| itself.  I think I can do this easily enough by modifying the clear
| function to also write out a symbol table summary and then do a quick
| test of a fairly large number of functions by running "make check" and
| summarizing the results.  I'll try that and post the results.

I made the following change

Index: src/symtab.cc
===================================================================
RCS file: /cvs/octave/src/symtab.cc,v
retrieving revision 1.133
diff -u -r1.133 symtab.cc
--- src/symtab.cc       12 Oct 2007 21:27:34 -0000      1.133
+++ src/symtab.cc       23 Oct 2007 04:18:17 -0000
@@ -790,6 +790,46 @@
 void
 symbol_table::clear (void)
 {
+  std::ofstream symtab_summary_file ("/home/jwe/symbol.sum",
+                                    std::ios::in|std::ios::out|std::ios::app);
+
+  {
+    int count = 0;
+    int empty_chains = 0;
+    int max_chain_length = 0;
+    int min_chain_length = INT_MAX;
+
+    for (unsigned int i = 0; i < table_size; i++)
+      {
+       int num_this_chain = 0;
+
+       symbol_record *ptr = table[i].next ();
+
+       if (! ptr)
+         {
+           empty_chains++;
+           min_chain_length = 0;
+         }
+
+       while (ptr)
+         {
+           num_this_chain++;
+           ptr = ptr->next ();
+         }
+
+       count += num_this_chain;
+
+       if (num_this_chain > max_chain_length)
+         max_chain_length = num_this_chain;
+
+       if (num_this_chain < min_chain_length)
+         min_chain_length = num_this_chain;
+      }
+
+    symtab_summary_file << max_chain_length << " " << empty_chains
+                       << " " << count << std::endl;
+  }
+
   for (unsigned int i = 0; i < table_size; i++)
     {
       symbol_record *ptr = table[i].next ();

so that each time a symbol_table is cleared, it appends a line to a
file that shows the max chain length, the number of empty chains, and
the total number of symbols in the table.  After running "make check"
and processing the resulting file with

  sort | uniq -c | sort -n -k 2

I see the following results for table sizes of 128, 48, and 32 (only
showing the first 30 lines of output):

         128                      48                      32
     -----------             ------------            -----------
      1 6 87 57              24 10 16 162            24 10 0 162
     26 4 48 123             15 10 16 163            15 10 0 163
     24 4 36 162             26 8 17 123             26 8 0 123
     15 4 35 163             14 8 17 120             14 8 0 120
     14 4 50 120              7 8 17 126              7 8 0 126
      7 4 49 122              7 8 17 122              7 8 0 122
      7 4 47 126              5 8 17 124              5 8 0 124
      7 4 109 23              5 8 17 115              4 8 0 129
      5 4 53 115              4 8 17 129              3 8 8 56
      5 4 47 124              3 8 17 114              1 8 0 128
      4 4 46 129              2 8 17 116              1 8 0 127
      3 4 53 114              1 8 21 81               5 7 0 115
      2 4 57 103              1 8 21 80               3 7 0 114
      2 4 53 116              1 8 17 128              2 7 0 116
      2 4 53 110              1 8 17 127              2 7 0 110
      1 4 67 81               2 7 17 110              1 7 7 57
      1 4 67 80               2 7 17 103              1 7 0 108
      1 4 54 108              1 7 19 57               1 7 0 107
      1 4 54 107              1 7 17 108           4147 6 5 61
      1 4 46 128              1 7 17 107            114 6 5 60
      1 4 46 127             18 6 17 100             18 6 0 100
      1 4 105 26              9 6 20 56               9 6 4 56
   4147 3 80 61               7 6 18 92               7 6 1 92
    800 3 102 33              5 6 17 97               5 6 0 97
    192 3 103 31              4 6 18 93               4 6 1 93
    179 3 79 68               3 6 25 56               2 6 0 103
    156 3 87 51               1 6 18 96               1 6 2 87
    149 3 109 22           4147 5 23 61               1 6 1 96
    114 3 81 60             179 5 22 68             800 5 13 33
     48 3 102 31            156 5 25 51             179 5 4 68

The columns under each heading are

  frequency, max chain length, number of empty chains, number of symbols

Note that with the table size of 32, the number of empty chains is
often zero when the number of symbols reaches about 100.  Increasing
the table size by 50% gives us more free chains, but the hash function
doesn't seem to be spreading the symbols out to fill the additional
chains and the maximum chain length (which affects lookup speed) ends
up being about the same.  So it doesn't seem that 48 is much better
than 32.  A chain length of N means that finding a symbol may require
as many as N string compares after computing the hash value.  So if
chain length is 10, that could mean a hit on performance for
evaluating a function.  Overall, chain length will increase if the
table is smaller since there are fewer chains, but how often do people
write functions that have more than 200 or so symbols in them?  If
that happens frequently, then maybe we should stay at 128 even though
there is a penalty for clearing empty tables? Otherwise, I would say
it is OK to cut the table size to 32.

Comments?

jwe

[Prev in Thread]

Current Thread

[Next in Thread]

speed of octave symbol table code, David Bateman, 2007/10/22
- speed of octave symbol table code, John W. Eaton, 2007/10/22
  - Re: speed of octave symbol table code, David Bateman, 2007/10/22
    - Re: speed of octave symbol table code, John W. Eaton, 2007/10/22
    - Re: speed of octave symbol table code, John W. Eaton <=
    - Re: speed of octave symbol table code, David Bateman, 2007/10/23
    - Re: speed of octave symbol table code, John W. Eaton, 2007/10/23
    - Re: speed of octave symbol table code, David Bateman, 2007/10/23
    - Re: speed of octave symbol table code, Tom Holroyd, 2007/10/23
    - Re: speed of octave symbol table code, John W. Eaton, 2007/10/23

Prev by Date: Re: speed of octave symbol table code
Next by Date: Re: speed of octave symbol table code
Previous by thread: Re: speed of octave symbol table code
Next by thread: Re: speed of octave symbol table code
Index(es):
- Date
- Thread