gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] more on dfa


From: Dave Denholm
Subject: Re: [gnugo-devel] more on dfa
Date: 19 Oct 2002 16:54:19 +0100

Arend Bayer <address@hidden> writes:

> I wrote a while ago:
> > > There is a clean way to solve this problem: Instead of having attributes
> > > for all states, we could only have attributes for those states that have
> > > a transition into the error state. I.e., only at the very end of the DFA
> > > spiral we generate a list of all patterns that got matched.
> 
> Dave wrote:
> 
> > Well, I had a go, and have a version in which all patterns are
> > pushed down the tree to the terminal states. I kept the
> > attributes too, so I could check the two approaches against
> > each other.
> >
> > Unfortunately, the dfa merges the tree where it can, so there
> > can be multiple paths to the terminal state, each picking up
> > different patterns.
> (...)
> > So I (or someone) will have to figure out how to modify the sync product
> > to not merge the branches, at least for paths containing different
> > attributes.
> 
> Yes, that's why the DFA size will have to be increased somewhat.
> 
> I think the cleanest way to solve this is as follows: Each DFA for a
> single pattern contains exactly one state with an attribute, and from
> there all arrows point to the error state.
> 
> Just change all arrows from this state to point to itself. I.e. the
> pattern gets two "end" states, the usual error state with the pattern
> not matched, or this state where the patttern got matched.
> 
> Then the synchronized product will automatically not merge branches
> which contain different attributes matched so far.
> 
> Of course, we will then end up with loads of states that point to
> itself. Either keep track of those (which we have to do anyway, as these
> are the ones having attributes), and change them to point to the error
> state.


I don't entirely follow - will have to think about it.

What about two identical patterns ?
Or a pattern which is a subset of a larger pattern ?


> The the actual dfa would be
> 
> while (next =  pdfa->states[state].next[dfa_p[dfa_pos + spiral[l][row]]]) {
>   state = next;
>   row++;
> }
> and then lookup the attributes belonging to "state" via a hash table.
> 
> Or reserve a part at the beginning (or end) of the DFA for these "end states".
> Then the dfa would be:
> while (state > number_of_end_states) {
>   state =  pdfa->states[state].next[dfa_p[dfa_pos + spiral[l][row]]]) {
>   row++;
> }
> 

My equivalent plan had been to represent such states with -ve numbers in
the (main part of the) dfa. So that loop continues while state > 0,
and then the -ve value indicates where in some table (perhaps
backwards from the end of the dfa) the information lives.



> > While I'm here, I noticed that dfa.c contains both routines used in
> > in the main gnugo binary and routines used only in mkpat.
> >
> > Probably cleaner to split out the dfa construction routines into
> > mkdfa.c
> There are a few common routines, however.
> 


Very few I could see - just buildSpiralOrder() [and its helpers]  ?

There is some data defined in dfa.c and used by both, but it seemed to
me that the two programs should define their own copies.
eg matchpat.c should have a file-static spiral order array, and just
call the dfa.c routine to fill it in, rather than sharing the global
variable exported by dfa.c itself. Fairly minor, though.



Well, FWIW here's where I'm at so far...


I've separated out from dfa.c the parts used for building a dfa, and
called it mkdfa.c

I've simplified the interface between mkpat.c and mkdfa.c - in
current gnugo sources, the dfa instance variable lives in mkpat.c
and is passed to dfa.c, but in my version, the data structures live
in mkdfa.c and mkpat.c just passes it the pattern.

mkdfa.c uses different data structures while building the dfa,
and only writes them out in the format needed by gnugo itself.


I've separated out the attrib list - rather than storing one attrib
list in each dfa, I've made that a separate list, shared between
all dfas. This cuts down a lot of the duplication. It may introduce
a few dead entries, but they can easily be pruned when all work has been
done.




My current plan is to store with each state in a dfa the list
of patterns matched leading up to that state. The list is in
the attribute array, and so it is just a single integer, which
is unique but shared across all dfas (since they all see the same
attribute list)

Then in the sync product, as before, output states are reused if they
are the same as previously encountered, but this now requires that the
pattern lists match too.

One concern is that pattern lists will not be quite unique : what I really
want is a pattern set (unordered) rather than a pattern list (ordered).
But that's not too big a deal, since it just means the dfa will share fewer
states that it might have been able too, and will be bigger than necessary.
It won't actually be wrong.

(Do we know roughly how many patterns get accepted by the dfa at any
 given position ?
)





As an aside... the code to create the dfa uses several intermediate
dfa's, then merges them all at the end. IIRC, it was discussed on the mailing
list as a means of speeding things up.

The sync-product routine takes the product between two arbitrary dfa's.
However, I'd have thought that we could get the performance by having
the product routine know that one of the dfa's is linear. Eg it means
the hash table can be eliminated, since each state in the 'left' (general) dfa
can be coupled to no more than one state in the 'right' (linear) dfa, we
can replace it by an array of size of the left dfa.


And only the final state of the right dfa has an attribute.

I may have a go and see if such a beast is fast enough.



I have also been wondering whether to get it to collect all the strings
initially, and then when it comes to finalise the dfa, sort the
strings by length, and add them from shortest to longest.
On the grounds that adding a string longer than any already in the dfa
means that we are never forced to revisit existing states and update them
to include the newly-added pattern number, if you see what I mean.


dd
-- 
address@hidden          http://www.insignia.com




reply via email to

[Prev in Thread] Current Thread [Next in Thread]