[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Crash in titdic-convert with DOS line ends
From: |
Jason Rumney |
Subject: |
Crash in titdic-convert with DOS line ends |
Date: |
Tue, 05 Feb 2008 01:31:34 +0000 |
User-agent: |
Thunderbird 2.0.0.9 (Windows/20071031) |
Jason Rumney wrote:
Some of the Big5 encoded files cannot be processed if they have DOS
line ends. I haven't yet figured out why.
ETZY.tit, PY-b5.tit, TONEPY.tit and ZOZY.tit have this problem, others
do not.
Now that I am debugging this, ETZY.tit does not crash Emacs, while
4Corner.tit does. It appears to be a problem with any DOS line ends in a
Big5 file that is inserted into a unibyte buffer, but some other
condition needs to be present to trigger the crash. But the following
shows that there is definitely a problem with DOS line ends in unibyte
buffers
;; Evaluate the following 2 forms in *scratch*. The first converts a
.tit file to DOS line ends, the second reads
;; it into a unibyte buffer as raw-text in the same way that
titdic-convert does.
(with-temp-buffer
(let ((coding-system-for-read 'cn-big5)
(coding-system-for-write 'cn-big5-dos))
(insert-file-contents (expand-file-name "CXTERM-DIC/4Corner.tit"
(file-name-directory (locate-library "leim-list"))))
(write-file "/tmp/test.tit")))
(set-buffer-multibyte nil)
(let ((coding-system-for-read 'raw-text))
(insert-file-contents "/tmp/test.tit"))
;; If Emacs does not crash, note the ^M on the ends of some lines.
When Emacs crashes, it always happens in decode_eol (several levels deep
from insert-file-contents), on this line:
> if (*p == '\r' && p[1] == '\n')
p appears to have overrun the buffer.
(gdb) print p
$35 = (unsigned char *) 0x2707000 <Address 0x2707000 out of bounds>
(gdb) print pbeg
$39 = (
unsigned char *) 0x26f9f30 "# HANZI input table for cxterm\n#
Generated from
ETZY.cit by cit2tit\n# To be used by cxterm, convert me to .cit format
first\n#
.cit version
1\nENCODE:\tBIG5\nMULTICHOICE:\tYES\nPROMPT:\t\244\244\244\345\277
\351\244J\241i\255\312\244\321\252`\255\265\241j\n"...
(gdb) print pend
$40 = (
unsigned char *) 0x27043bb
"a\264\303\254\341\305`\272\372\255\276\262\360\3
46\262\311`\370\332\r\nvx83\t\272\336\300]\262\360\265_\337F\327E\336\307\353\33
5\r\nvx84\t\272D\263e\304\351\305\370\341\350\277d\306|\253a\306[\311c\366\355\3
66\360\336\363\367\353\371u\325\341\325V\330\371\361q\371\312\r\nvx93\t\272u\263
address@hidden
250\355\275\275\276h\251K\265\301\357~\321\353\323\354\363\274\320g\337\242\332\
341\337\262\341A\342\336\346\352\357\317\340a\355\356\r\nvxa3\t\271\350\324l\r\n
vxa4\t\261\276\250\366\273o\337h\326"...
Some of this looks suspicious, but I don't know enough to say for sure
if it is corrupt...
(gdb) print *coding
$41 = {
id = 10,
common_flags = 5376,
mode = 2,
spec = {
iso_2022 = {
flags = 106,
current_invocation = {112, 51},
current_designation = {34, 32, 34, 31248},
single_shifting = 34,
bol = 41
},
ccl = 0x6a,
utf_16 = {
bom = 106,
endian = 112,
surrogate = 51
},
emacs_mule_full_support = 106
},
max_charset_id = 0,
safe_charsets = 0x170f4e4 "\303\277",
src_multibyte = 0,
dst_multibyte = 0,
head_ascii = -1,
produced = 42123,
produced_char = 42123,
consumed = 42123,
consumed_char = 42123,
errors = 0,
error_positions = 0x22,
result = CODING_RESULT_SUCCESS,
src_pos = -42123,
src_pos_byte = -42123,
src_chars = 42123,
src_bytes = 42123,
src_object = 26925060,
source = 0x26fa700
"---+----+----+----+----+----+----+----+\nCOMMENT |
(SPACE BAR)", ' ' <repeats 22 times>, "|\nCOMMENT |", '
' <repe
ats 22 times>, "\263\261\245\255", ' ' <repeats 16 times>,
"|\nCOMMENT +
", '-' <repeats 21 times>...,
dst_pos = 1,
dst_pos_byte = 1,
dst_bytes = 2000,
dst_object = 26925060,
destination = 0x26f9f30 "# HANZI input table for cxterm\n# Generated
from ETZY
.cit by cit2tit\n# To be used by cxterm, convert me to .cit format
first\n# .cit
version
1\nENCODE:\tBIG5\nMULTICHOICE:\tYES\nPROMPT:\t\244\244\244\345\277\351\
244J\241i\255\312\244\321\252`\255\265\241j\n"...,
chars_at_source = 1,
charbuf = 0x80ab40,
charbuf_size = 16384,
charbuf_used = 0,
annotated = 0,
carryover =
"\352m\000\000\031]\000\000\226O\000\000\270}\000\000\204c\000\000
\aW\000\000\226x\000\000\000\223\000\000\300`\000\000o\226\000\000\325\203\000\0
00\032\216\000\000\306h\000\000&\207\000\000\"\000\000\000)\000\000",
carryover_bytes = 0,
default_char = 32,
detector = 0,
decoder = 0x116d3ba <decode_coding_raw_text>,
encoder = 0x116d3f6 <encode_coding_raw_text>
}
- Re: make bootstrap fails on w32/MinGW, (continued)
- Re: make bootstrap fails on w32/MinGW, Jason Rumney, 2008/02/03
- Re: make bootstrap fails on w32/MinGW, Lennart Borgman (gmail), 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Juanma Barranquero, 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Lennart Borgman (gmail), 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Andreas Schwab, 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Juanma Barranquero, 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Stefan Monnier, 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Juanma Barranquero, 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Eli Zaretskii, 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Jason Rumney, 2008/02/04
- Crash in titdic-convert with DOS line ends,
Jason Rumney <=
- Re: Crash in titdic-convert with DOS line ends, Kenichi Handa, 2008/02/04
- Re: make bootstrap fails on w32/MinGW, Lennart Borgman (gmail), 2008/02/06
- Re: make bootstrap fails on w32/MinGW, Eli Zaretskii, 2008/02/06
- Re: make bootstrap fails on w32/MinGW, Lennart Borgman (gmail), 2008/02/06
- Re: make bootstrap fails on w32/MinGW, Eli Zaretskii, 2008/02/06
- Re: make bootstrap fails on w32/MinGW, Lennart Borgman (gmail), 2008/02/06