chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

srfi-130 2.0.1: An improved CHICKEN string library


From: Wolfgang Corcoran-Mathe
Subject: srfi-130 2.0.1: An improved CHICKEN string library
Date: Fri, 9 Sep 2022 13:33:44 -0400

Hi all,

I’ve just released version 2.0.1 of the srfi-130 egg[0], which is my
quixotic attempt at a better string library for CHICKEN.  It’s a new,
fully Unicode-aware, opaque-cursor implementation of John Cowan’s
SRFI 130[1] built on top of the utf8[2] egg.  Some benefits:

* String cursors, which encapsulate byte offsets, provide faster
  indexing and substring operations on Unicode strings than codepoint
  indices.  For example, srfi-130’s ‘string-ref/cursor’ runs in
  (notional) constant time when given a cursor, while utf8’s
  ‘string-ref’ requires O(n) time.

* All srfi-130 procedures that take cursors can also take
  (codepoint) indices, so porting between srfi-13/srfi-152/utf8
  and srfi-130 should be relatively easy.

* Cursors are type-safe, and you can only create valid cursors
  (but see “Caveats” below).  Low-level functional programmers
  may consider this decadent, but I believe it encourages better
  programming.  Passing hand-computed offsets to CHICKEN’s
  byte-oriented string operations is asking for trouble, and cursors
  are a more disciplined way to achieve the same goals with similar
  efficiency.

* Better error reporting.  The srfi-130 egg tries to provide
  useful exceptions with correct locations which follow CHICKEN’s
  internal condition protocol (e.g. type errors raise (exn type)
  conditions, etc.)  This is in contrast to the utf8 egg’s errors,
  which are often hard to trace (“where exactly did string-ref get
  that invalid index?”).

* More rigorous, randomized testing using the test-generative egg.

# Caveats

Cursors are very useful, but they don’t play well with string
mutation.  Mutating a string invalidates all cursors into it, but
it’s a hard problem to catch these situations efficiently.  It’s
also possible to use a cursor on a different string than the one
it refers to, which is also an (uncaught) error.  This could be
averted with an ‘eqv?’ check, if it annoys enough people.

In sum, I think that the new srfi-130 egg has some important benefits
while mostly maintaining backwards compatibility with srfi-13 and the
other CHICKEN string libraries.  I hope that some CHICKEN programmers
will consider it.  Suggestions and patches are welcome.

Best regards, Wolf

[0] https://wiki.call-cc.org/eggref/5/srfi-130
[1] https://srfi.schemers.org/srfi-130/
[2] https://wiki.call-cc.org/eggref/5/utf8

Thanks to John and to Will Clinger for creating SRFI 130.

-- 
Wolfgang Corcoran-Mathe  <wcm@sigwinch.xyz>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]