bug-classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug classpath/22842] problems with filename charset conversion


From: gcc-bugzilla at gcc dot gnu dot org
Subject: [Bug classpath/22842] problems with filename charset conversion
Date: 16 Oct 2005 01:27:25 -0000

Original bugreport:
http://www.kaffe.org/pipermail/kaffe/2005-January/101179.html

When doing file-i/o, filenames should be encoded
according to locale.  Currently there are problems.

This may be VM problem as they all act differently,
but as they all have different runtime classes,
so its a mess and I still post it here.

## utf8 ##
java: list:ok open:ok
java-sablevm: list:ok open:FAILED
jamvm-cvs: list:ok open:ok
kaffe-cvs: list:FAILED open:ok
## iso-8859-1 ##
java: list:ok open:ok
java-sablevm: list:FAILED open:FAILED
jamvm-cvs: list:FAILED open:FAILED
kaffe-cvs: list:ok open:ok
-----------------------------------------------
java - Blackdown 1.4.2-01
jamvvm-cvs: jamvm-1.2.3+patches + classpath cvs
java-sablevm: sablevm 1.1.8 from Debian
-----------------------------------------------
ListDir.java (also attached):

import java.io.*;
/*
 * This should be compiled with  'jikes -encoding utf8'
 * or javac in utf8 locale.
 */
public class ListDir {
    public static void main(String [] args) {
        System.out.print("list:");
        File dir = new File(args[0]);
        String [] list = dir.list();
        String fn = list[0];

        if (fn.equals("test.äöüõ.txt"))
            System.out.print("ok ");
        else
            System.out.print("FAILED ");

        System.out.print("open:");
        try {
            File f = new File(args[0], fn);
            if (f.exists())
                System.out.println("ok");
            else
                System.out.println("FAILED");
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}


------- Comment #1 from from-classpath at savannah dot gnu dot org  2005-01-18 
11:51 -------
Noticed that the pipermail post is empty and without attachments,
heres better link:

http://article.gmane.org/gmane.comp.java.vm.kaffe.general/7825


------- Comment #2 from from-classpath at savannah dot gnu dot org  2005-03-02 
19:12 -------
I have tried to reproduce your results and failed.
I am using Sun java, gij head and kaffe head (UTF-8 locale):

java: list:FAILED open:ok
gij: list:FAILED open:ok
kaffe: list:FAILED open:ok

Digging around internally reveals that the variable fn is set as follows:

java: fn=test.a?o?u?o?.txt
gij: fn=test.äöüõ.txt
kaffe: fn=test.äöüõ.txt


------- Comment #3 from from-classpath at savannah dot gnu dot org  2005-03-02 
22:22 -------
You give me too little information.  What failed?
What Sun's JVM did you use?  Did you used my scripts?
Those should report both iso-8859-1 and utf-8 charset
case, did you run .class by hand?  Could you try Blackdown
VM, this seems to handle it correctly.

The basic idea is that internally jvm should keep filenames
as decoded unicode (utf16), but when reading directory,
or using string as filename, jvm should convert from/to
locale charset.

ListDir.java checks whether the locale-> utf-16 conversion
happened (list:?), and also utf-16 -> locale (open:?)

"list:FAILED open:ok" suggests that those jvm's handle
filenames as byte-array's without converting.  (Or the
unicode chars did not reach .class file - what compiler
did you use?  Jikes 1.22 seems to handle it correctly)


------- Comment #4 from from-classpath at savannah dot gnu dot org  2005-03-03 
11:31 -------
I didn't use the scripts because gen.sh failed with 

  touch: cp.iso-8859-1/test.����.txt: Invalid
argument

I ran the UTF-8 test by hand. The Sun VM I used is as follows:

java version "1.4.2_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_05-141.4)
Java HotSpot(TM) Client VM (build 1.4.2-38, mixed mode)

I cannot use the Blackdown VM as it doesn't run on this platform
(powerpc-apple-darwin7.8.0).

I used javac with LC_ALL=en_GB.UTF-8. Attempting to compile with jikes 1.22
hangs the compiler. I have compiled with "gcj -C --encoding=UTF-8 ListDir.java"
with the same results as shown.


------- Comment #5 from from-classpath at savannah dot gnu dot org  2005-03-03 
11:51 -------
> I didn't use the scripts because gen.sh failed with
> touch: cp.iso-8859-1/test.����.txt: Invalid 
> argument

Somebody ate those chars so I cant tell anything about
that error.  Btw, what fs do you use?  Does it support
random encoding (like linux fs'es) or does it expect
something specific?  (I do not know anything about darwin)

Also could you recheck if ls DIR > tmp.txt does produce
the filename in correct encoding.

-----------
Ok, I'll try 1.4.2_05 myself too.

javac should be fine.


------- Comment #6 from from-classpath at savannah dot gnu dot org  2005-03-03 
12:05 -------
The filesystem is HFS+. It supports Unicode filenames, and I can create a file
with "echo test.äöüõ.txt>test.äöüõ.txt" under bash and ls and cat it in the
UTF-8 locale, but iconv converts all these UTF-8 characters to #65533 as you
can see.


------- Comment #7 from from-classpath at savannah dot gnu dot org  2005-03-03 
12:55 -------
If you say that Unicode filename is stored ok on filesystem,
then Sun's JVM is buggy on that respect on Darwin.  (Filename
from filesystem does not compare equal to same filename
stored inside program.)

The iconv seems suspicious too...
--------------------------------------
Here's my latest results:

## utf8 ##
java: list:ok open:ok
java-142_05: list:ok open:ok
gij-snap: list:ok open:ok
java-sablevm: list:ok open:FAILED
jamvm-cvs: list:ok open:ok
kaffe-cvs: list:FAILED open:ok
## iso-8859-1 ##
java: list:ok open:ok
java-142_05: list:ok open:ok
gij-snap: list:Exception in thread "main" java.lang.NullPointerException
   at ListDir.main(java.lang.String[]) (Unknown Source)
   at gnu.java.lang.MainThread.call_main()
(/usr/lib/gcc-snapshot/lib/libgcj.so.6.0.0)
   at gnu.java.lang.MainThread.run()
(/usr/lib/gcc-snapshot/lib/libgcj.so.6.0.0)java-sablevm: list:FAILED
open:FAILED
jamvm-cvs: list:FAILED open:FAILED
kaffe-cvs: list:ok open:ok
###########################################
## java -version
java version "1.4.2-01"
Java(TM) 2 Runtime Environment, Standard Edition (build Blackdown-1.4.2-01)
Java HotSpot(TM) Client VM (build Blackdown-1.4.2-01, mixed mode)
## java-142_05 -version
java version "1.4.2_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_05-b04)
Java HotSpot(TM) Client VM (build 1.4.2_05-b04, mixed mode)
## gij-snap -version
gij (GNU libgcj) version 4.1.0 20050227 (experimental)
## java-sablevm -version
SableVM version 1.1.9
- compile date and time: 2005-01-21 04:10:55 UTC
- gcc version: 3.3.5 (Debian 1:3.3.5-6)
- 'real life brokenness' features enabled
- signal based exception detection
- copying garbage collection
- bidirectional object layout
- inline-threaded interpreter
## jamvm-cvs -version
JamVM version 1.2.5
## kaffe-cvs -version
Kaffe Virtual Machine
Engine: Just-in-time v3   Version: 1.1.x-cvs   Java Version: 1.1
###########################################

Classpath and kaffe are latest cvs.

Sun's JRE 1.4.2_05 seems ok.


------- Comment #8 from from-classpath at savannah dot gnu dot org  2005-03-03 
13:08 -------

Seems the HFS+ is rather highly involved in unicode filename
handling, which is rather different from usual Linux/Unix
filesystems (ufs, ext*, reiser).  So Darwin/HFS+ needs
special-casing in programs, otherwise unicode bugs are likely.


http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtleties

http://lists.samba.org/archive/samba-technical/1999-August/004965.html


------- Comment #9 from from-classpath at savannah dot gnu dot org  2005-03-03 
20:04 -------
Your results appear to suggest that the problem lies in VM code rather than
classpath.


------- Comment #10 from from-classpath at savannah dot gnu dot org  2005-03-03 
22:25 -------

Well, maybe.  I do not know the classpath/vm boundaries that well.

.. AND, if only I could test the VM's against SAME Classpath (HINT, HINT) ...

[...]

Ok now, looking into native/jni/java-io/java_io_VMFile.c,
then native/target/generic/target_generic_file.h,
then native/target/Linux/target_native_file.h,
then jamvm 1.2.5 sources,

I still blame classpath, as the system-specific stuff should happen behind
TARGET_NATIVE_FILE_OPEN_* macros which are provided by Classpath.

Why the VM's act so different, good question...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22842





reply via email to

[Prev in Thread] Current Thread [Next in Thread]