[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq
From: |
Chicken Trac |
Subject: |
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences |
Date: |
Sun, 29 Mar 2015 16:50:03 -0000 |
#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
Reporter: syn | Owner: ashinn
Type: defect | Status: new
Priority: major | Milestone: someday
Component: extensions | Version: 4.9.x
Resolution: | Keywords: utf8
-------------------------+--------------------------------------------------
Comment(by syn):
Hey Alex,
thanks for your reply!
> This is intentional - existing chicken code mixes binary strings
> and text strings as strings, so we can't in general forbid such
> invalid sequences.
The utf8 egg's procedures surely could detect them, the question is
whether that is the wisest way to go about it. But see below.
> We can try to provide sane defaults, and indeed if you use that
> definition of evil-quote with utf8 imported, you get a valid
> sequence.
No, it's an invalid sequence as per the UTF-8 spec both in the Unicode
standard and the RFC. See the Wikipedia article -- it is certainly
possible to interpret some of them but it's still outside of the spec,
thus potentially leading to exploits.
> We absolutely can't do anything about users who
> aren't even using the utf8 egg.
Sure, I'm only talking about the utf8 egg here -- the core string
procedures are defined to operate on the byte level so that's what users
get.
> What we _can_ (and should) do is provide utilities to check if
> a string is valid utf8, and/or strip invalid sequences.
Yep, I think that'd be my preferred solution, too. I've implemented UTF-8
validation the other day which I'd be willing to contribute to the utf8
egg if you like. I have both a Scheme and a C implementation, the latter
of which is an order of magnitude faster than the former. Would you care
for a patch?
--
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:3>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.
- [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/27
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences,
Chicken Trac <=
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/31