[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: speed of octave symbol table code
From: |
John W. Eaton |
Subject: |
Re: speed of octave symbol table code |
Date: |
Tue, 23 Oct 2007 00:39:12 -0400 |
On 22-Oct-2007, John W. Eaton wrote:
| On 23-Oct-2007, David Bateman wrote:
|
| | I think changing symtab.h:502 in the following manner for 3.0 will be a win.
| |
| | - symbol_table (unsigned int tab_s = 128,
| | + symbol_table (unsigned int tab_size = 32,
|
| I thought about it some more and I think we should do a simple test
| before making this change. The thing I want to check is whether the
| smaller table size results in significantly more hash collisions that
| result in longer chains in the table. If so, then although you are
| speeding up some symbol table operations, you could be slowing symbol
| lookups, which will affect the speed of evaluating the function
| itself. I think I can do this easily enough by modifying the clear
| function to also write out a symbol table summary and then do a quick
| test of a fairly large number of functions by running "make check" and
| summarizing the results. I'll try that and post the results.
I made the following change
Index: src/symtab.cc
===================================================================
RCS file: /cvs/octave/src/symtab.cc,v
retrieving revision 1.133
diff -u -r1.133 symtab.cc
--- src/symtab.cc 12 Oct 2007 21:27:34 -0000 1.133
+++ src/symtab.cc 23 Oct 2007 04:18:17 -0000
@@ -790,6 +790,46 @@
void
symbol_table::clear (void)
{
+ std::ofstream symtab_summary_file ("/home/jwe/symbol.sum",
+ std::ios::in|std::ios::out|std::ios::app);
+
+ {
+ int count = 0;
+ int empty_chains = 0;
+ int max_chain_length = 0;
+ int min_chain_length = INT_MAX;
+
+ for (unsigned int i = 0; i < table_size; i++)
+ {
+ int num_this_chain = 0;
+
+ symbol_record *ptr = table[i].next ();
+
+ if (! ptr)
+ {
+ empty_chains++;
+ min_chain_length = 0;
+ }
+
+ while (ptr)
+ {
+ num_this_chain++;
+ ptr = ptr->next ();
+ }
+
+ count += num_this_chain;
+
+ if (num_this_chain > max_chain_length)
+ max_chain_length = num_this_chain;
+
+ if (num_this_chain < min_chain_length)
+ min_chain_length = num_this_chain;
+ }
+
+ symtab_summary_file << max_chain_length << " " << empty_chains
+ << " " << count << std::endl;
+ }
+
for (unsigned int i = 0; i < table_size; i++)
{
symbol_record *ptr = table[i].next ();
so that each time a symbol_table is cleared, it appends a line to a
file that shows the max chain length, the number of empty chains, and
the total number of symbols in the table. After running "make check"
and processing the resulting file with
sort | uniq -c | sort -n -k 2
I see the following results for table sizes of 128, 48, and 32 (only
showing the first 30 lines of output):
128 48 32
----------- ------------ -----------
1 6 87 57 24 10 16 162 24 10 0 162
26 4 48 123 15 10 16 163 15 10 0 163
24 4 36 162 26 8 17 123 26 8 0 123
15 4 35 163 14 8 17 120 14 8 0 120
14 4 50 120 7 8 17 126 7 8 0 126
7 4 49 122 7 8 17 122 7 8 0 122
7 4 47 126 5 8 17 124 5 8 0 124
7 4 109 23 5 8 17 115 4 8 0 129
5 4 53 115 4 8 17 129 3 8 8 56
5 4 47 124 3 8 17 114 1 8 0 128
4 4 46 129 2 8 17 116 1 8 0 127
3 4 53 114 1 8 21 81 5 7 0 115
2 4 57 103 1 8 21 80 3 7 0 114
2 4 53 116 1 8 17 128 2 7 0 116
2 4 53 110 1 8 17 127 2 7 0 110
1 4 67 81 2 7 17 110 1 7 7 57
1 4 67 80 2 7 17 103 1 7 0 108
1 4 54 108 1 7 19 57 1 7 0 107
1 4 54 107 1 7 17 108 4147 6 5 61
1 4 46 128 1 7 17 107 114 6 5 60
1 4 46 127 18 6 17 100 18 6 0 100
1 4 105 26 9 6 20 56 9 6 4 56
4147 3 80 61 7 6 18 92 7 6 1 92
800 3 102 33 5 6 17 97 5 6 0 97
192 3 103 31 4 6 18 93 4 6 1 93
179 3 79 68 3 6 25 56 2 6 0 103
156 3 87 51 1 6 18 96 1 6 2 87
149 3 109 22 4147 5 23 61 1 6 1 96
114 3 81 60 179 5 22 68 800 5 13 33
48 3 102 31 156 5 25 51 179 5 4 68
The columns under each heading are
frequency, max chain length, number of empty chains, number of symbols
Note that with the table size of 32, the number of empty chains is
often zero when the number of symbols reaches about 100. Increasing
the table size by 50% gives us more free chains, but the hash function
doesn't seem to be spreading the symbols out to fill the additional
chains and the maximum chain length (which affects lookup speed) ends
up being about the same. So it doesn't seem that 48 is much better
than 32. A chain length of N means that finding a symbol may require
as many as N string compares after computing the hash value. So if
chain length is 10, that could mean a hit on performance for
evaluating a function. Overall, chain length will increase if the
table is smaller since there are fewer chains, but how often do people
write functions that have more than 200 or so symbols in them? If
that happens frequently, then maybe we should stay at 128 even though
there is a penalty for clearing empty tables? Otherwise, I would say
it is OK to cut the table size to 32.
Comments?
jwe