[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq
From: |
Chicken Trac |
Subject: |
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences |
Date: |
Mon, 30 Mar 2015 09:59:12 -0000 |
#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
Reporter: syn | Owner: ashinn
Type: defect | Status: closed
Priority: major | Milestone: someday
Component: extensions | Version: 4.9.x
Resolution: invalid | Keywords: utf8
-------------------------+--------------------------------------------------
Comment(by syn):
Replying to [comment:6 ashinn]:
> That's not a complete test,
What's missing?
> and you're using different code now.
I was using `string-for-each` in my inital example to illustrate the
general issue but that doesn't lend itself too well for a test so I
switched to `string->list` instead. As both procedures rely on the same
UTF-8 decoder internally, the code is essentially equivalent AFAICT.
> (use utf8) puts the standard procedures in utf8 mode. If you
> pass valid inputs to those procedures and get an invalid output
> it's a bug, and I will fix it. If you pass invalid inputs, you get
> undefined results. Both of your examples are of invalid inputs,
> created outside of utf8.
Yep, that's exactly the point: passing strings that were created without
any of the `utf8` string constructors. Please also read my second last
reply again: I agree with you about preserving the current behavior of the
decoder procedures. Instead, we should provide validation procedures for
users who need to deal with strings they received from untrusted sources
(e.g. from third party libraries which don't use the `utf8` procedures).
I think the issue boils down to the fact that the `utf8` egg overloads /
re-uses the core string type but currently doesn't provide a predicate to
check whether a string is actually valid for use with its API.
I hope that clarifies my point :-) So again: Would you be interested in
integrating such a validation predicate with the `utf8` egg? I think it
would belong there but I can also make it a separate egg if you prefer.
--
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:8>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.
- [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/27
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences,
Chicken Trac <=
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/31