[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq
From: |
Chicken Trac |
Subject: |
Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences |
Date: |
Mon, 30 Mar 2015 20:40:09 -0000 |
#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
Reporter: syn | Owner: ashinn
Type: defect | Status: reopened
Priority: major | Milestone: someday
Component: extensions | Version: 4.9.x
Resolution: | Keywords: utf8
-------------------------+--------------------------------------------------
Changes (by syn):
* status: closed => reopened
* resolution: invalid =>
Comment:
Replying to [comment:9 ashinn]:
> Yes, a validation predicate is a long-standing todo.
> I met get around to it soon, patches are also welcome.
I'm attaching a patch which adds `utf8-validation` module along with some
rudimentary sanity tests. It only exports the discussed predicate, named
`utf8-string?` -- that's the reason I put it in a separate module, since I
couldn't think of a better name and I didn't want to make the main `utf8`
module un-prefixable. Perhaps you have a better idea?
The validation algorithm is based on The Unicode Standard, Version 7.0 -
Core Specification, Table 3-7, p. 125. It performs reasonably well when
compiled with `-O2 -specialize` or `-O3` (around an order of magnitude
slower than an implementation of the same algorithm in C). I provide it to
you under the same license as the `utf8` egg so feel free to include it.
> It would in theory be possible to validate every input
> to every utf8 operation, but I have no intention of doing
> so, for performance reasons and because people may
> currently be using invalid utf8 in "safe" ways already.
Yep, I totally agree with that!
--
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:10>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.
- [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/27
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/29
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/30
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences,
Chicken Trac <=
- Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences, Chicken Trac, 2015/03/31