bug-classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug classpath/22691] java.io.StreamTokenizer behaves differently from S


From: gcc-bugzilla at gcc dot gnu dot org
Subject: [Bug classpath/22691] java.io.StreamTokenizer behaves differently from Sun's implementation
Date: 16 Oct 2005 01:26:34 -0000

Date: Thu, 3 Jul 2003 12:37:14 +0900
From: Ito Kazumitsu <address@hidden>
To: address@hidden

HI,

It has been discussed in the kaffe mailing list [1] that Sun's
implementation of java.io.StreamTokenizer does not respect
the obsoltete document JLS 1st ed. and its working specificatin
is unknown.  Kaffe's java.io.StreamTokenizer has been modified
so that it simulates Sun's current implementation [2].

I checked GNU Classpath's java.io.StreamTokenizer to find
that it behaves differently from Sun's implementation.

Attached is my test program that generates various patterns
of test cases and prints the results of them.

If GNU Classpath is to simulate Sun's implementation,  I hope
these pieces of information can be of some help.

[1] http://www.kaffe.org/pipermail/kaffe/2003-June/042843.html
[2]
http://www.kaffe.org/cgi-bin/viewcvs.cgi/kaffe/libraries/javalib/java/io/StreamTokenizer.java

Attached program:
bash$ cat StreamTokenizerTest2.java
import java.io.*;
public class StreamTokenizerTest2 {

  private static String testString;
  private static String testChar;

  public static void main(String[] args) throws Exception {
      testString = args[0];
      testChar = args[1];
      String[] a = new String[] {"S", "C", "Q", "W", "N"};
      for (int i=1; i<=5; i++) {
          Permutation.generate(a, i, new MainHandler());
      }
  }

  private static class MainHandler extends Permutation.Handler {
    public void doit(Object[] array) {
        try {
            System.out.print(testString + " " + testChar + " ");
            for (int i=0; i<array.length; i++) {
                System.out.print(array[i] + " ");
            }
            System.out.println();
            test(array);
        }
        catch (Exception e) {
            System.err.println(e);
        }
    }
  }

  private static void test(Object[] args) throws Exception {
    StreamTokenizer tok = new StreamTokenizer(new StringReader(testString));
    tok.resetSyntax();
    int c = testChar.charAt(0);
    for (int i=0; i<args.length; i++) {
       if (args[i].equals("S")) tok.whitespaceChars(c, c);
       else if (args[i].equals("C")) tok.commentChar(c);
       else if (args[i].equals("Q")) tok.quoteChar(c);
       else if (args[i].equals("N")) tok.parseNumbers();
       else if (args[i].equals("W")) tok.wordChars(c, c);
    }
    while (true) {
      int t = tok.nextToken();
      if (t == StreamTokenizer.TT_NUMBER) {
          System.out.println(tok.nval + ": " + t);
      }
      else {
          System.out.println(tok.sval + ": " + t);
      }
      if (t == StreamTokenizer.TT_EOF) break;
    }
  }
}
bash$ cat Permutation.java
public class Permutation {

    public static void generate(Object[] array, int n, Handler h) {
        int l = array.length;
        if (n == 1) {
            for (int i = 0; i < l; i++) {
                h.doit(new Object[] {array[i]});
            }
            return;
        }
        final int N = n;
        final Handler H = h;
        for (int i = 0; i < l; i++) {
            final Object OBJ = array[i];
            Object[] a1 = new Object[l - 1];
            System.arraycopy(array, 0, a1, 0, i);
            System.arraycopy(array, i+1, a1, i, l-i-1);
            generate(a1, n-1, new Handler() {
                public void doit(Object[] a2) {
                    Object[] a3 = new Object[N];
                    System.arraycopy(a2, 0, a3, 1, N-1);
                    a3[0] = OBJ;
                    H.doit(a3);
                }
            });
        }
        return;
    }

    public static class Handler {
        public void doit(Object[] array) {}
    }

}
bash$ java StreamTokenizerTest2 121 1
121 1 S 
null: 50
null: -1
121 1 C 
null: -1
(snip)
121 1 N W Q S C 
null: -1
121 1 N W Q C S 
21.0: -2
null: -1
bash$ 


------- Comment #1 from from-classpath at savannah dot gnu dot org  2004-03-12 
12:31 -------
The Classpath and kaffe version of StreamTokenizer or now in sync. Does this
mean the bug is solved? Or that the bug is now present in both?


------- Comment #2 from from-classpath at savannah dot gnu dot org  2005-02-28 
16:22 -------
The bug #4742 contains a comprehensive test, verifying the work of
StreamTokenizer by parsing randomly generated data under randomly set options.
It reveals the single difference between the Sun and Classpath implementations. 

If the numeric character ('0'..'9') obtains a specific status of comment,
whitespace or some other character, Sun's implementation only treats it as such
if found standing alone or in the beginning of the multi - character number. If
the number parsing has once started, the digits in that number are trated as
digits regardless of they spacific status. 

For example, after calling whitespaceChars('1', '1'), it reads "121" as 21
while Classpatch implementation reads it as "2", of course.

It is a piece of cake to make a patch for reproducing this behavior. The
StreamTokenizer has many Mauve tests and it is easy to ensure this does not
introduce any obvious regressions.

The question is, to fix of not to fix? A bug or not a bug? Who and why may need
to mark digits as a whitespace and still want to parse numbers? Maybe some very
old historical reasons?

In any case, 4742 will either be closed either as "not a bug" or as "fixed".


------- Comment #3 from from-classpath at savannah dot gnu dot org  2005-02-28 
19:08 -------
After some thinking I have remembered, that the old FORTRAN machines, living
last days when I was the first course student, under some specific
circumstances treated zero (0) and space identically when reading from
perfocards. A number, starting with zeros (like 0003) is the same as the number
starting with spaces (like     3), but it is not the same as 30 or 3000 (but
they were also reading 3 3 as 303). It may be correct to treat 0 as space if
found in the beginning of the number, but not inside the number. Probably this
is a real bug? OK, we have a patch for it, but I need some thinkings from the
rest of the team before I fix.


------- Comment #4 from from-classpath at savannah dot gnu dot org  2005-03-01 
16:07 -------
This might be an obscure interaction with parseNumbers() which sets all
numbers, '.' and '-' to trigger nextToken() to set ttype to TT_NUMBER and nval
to the parse number. The JCL book says that "Once parseNumbers() has been
invoked, the only way to undo its effects is to call ordinaryChar() or
wordChars() explicitly on those chars and the default StreamTokenizer
constructor calls parseNumbers(). So presumably calling resetSyntax() and then
setting any number to a comment char would result in a "normal" parse with
number being treated either as real digits or as real comment chars.

If we decide to emulate this behavior we should document this very clearly.
(More clearly then I did above!) So people are not surprized by this.


------- Comment #5 from from-classpath at savannah dot gnu dot org  2005-03-01 
19:24 -------
It is possible to reach the uniform behaviour if we require chars in a number
to be digits, but additionally NOT to be whitespace and so on. With this
additional condition, 121 is treated as 2 independently form the calling order
(resetSyntax parseNumbers whitespaceChars('1', '1') same as resetSyntax
whitespaceChars('1', '1') parseNumbers ). The current version works differently
for these two sequences (2 for the first case, 21 for the second). 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22691





reply via email to

[Prev in Thread] Current Thread [Next in Thread]