[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq
From: |
Chicken Trac |
Subject: |
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences |
Date: |
Sun, 29 Mar 2015 22:23:58 -0000 |
#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
Reporter: syn | Owner: ashinn
Type: defect | Status: new
Priority: major | Milestone: someday
Component: extensions | Version: 4.9.x
Resolution: | Keywords: utf8
-------------------------+--------------------------------------------------
Comment(by syn):
Replying to [comment:4 ashinn]:
> Maybe I misunderstand, but your code does not generate
> an invalid byte sequence for me:
>
> $ csi -R utf8 -p "(list->string (map integer->char '(#b11000000
#b10100111)))"
> ˤ
> $ csi -R utf8 -p "(list->string (map integer->char '(#b11000000
#b10100111)))" | hexdump -C
> 00000000 c3 80 c2 a7 0a |.....|
> 00000005
That bit is just about constructing the (invalid) byte sequence that is to
be fed to the UTF-8 decoder. Note that I mentioned that `list->string`
here is the core procedure, not the one from the `utf8` egg.
> These are the characters 00C0;LATIN CAPITAL LETTER A WITH GRAVE
> and 00A7;SECTION SIGN corresponding to #b11000000 and #b10100111.
Right, those are the Unicode code points represented by these two numbers,
which the `utf8` egg's `list->string` procedure properly encodes as a 4
byte UTF-8 sequence. However, as mentioned above, the issue is about a
byte sequence `c0 a7` (in the form of a CHICKEN string) which is passed to
one of the `utf8` egg's decoding procedures.
> If you find what you think is a bug, please write a full program and
attach it,
> using "test" to show clearly what you expect and what is different.
Here you go! Since there is no correct value to expect (because there is
no way to UTF-8 decode this byte sequence) I am using an inverted `test-
assert`:
{{{
(use test (prefix utf8 utf8-))
(test-assert (not (string=? "'" (utf8-list->string (utf8-string->list
(list->string (map integer->char '(#b11000000 #b10100111))))))))
}}}
--
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:5>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.
- [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/27
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences,
Chicken Trac <=
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/31