[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "split-sentences"?
From: |
tomas |
Subject: |
Re: "split-sentences"? |
Date: |
Sat, 23 Jan 2021 14:10:56 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Sat, Jan 23, 2021 at 10:35:51AM +0100, moasenwood--- via Users list for the
GNU Emacs text editor wrote:
> tomas wrote:
>
> > Not exactly your result, but this comes close:
[...]
> > You can adjust the results by tweaking the regexp (try word
> > boundaries like '\<' and '\>'
>
> *scratches my head*
A candidate for a sentence boundary is a word boundary
(plus some other conditions). This was at least my thought
process leading to that suggestion. It might be a bad
suggestion, though.
> > if you want to keep punctuation) or the other split-string's
> > optional params (e.g. drop the empty matches, etc.).
>
> Well, that's a start, for sure. Thanks :)
You're welcome. Note that [:punct:] may be too broad a category:
does a sentence end with a comma? A semi-colon? A colon? What
about question and exclamation marks? What about the latter in
a language like Spanish, where they're parenthetical: "Ella
me preguntó ¿qué quieres?" (the parenthetical things make it
much easier to embed a question or an exclamation into something
else).
As always, the really interesting questions are left as exercises to
the reader... until you end with Natural Language Processing :-)
Possibly this is the danger Tomas Hlavaty is hinting at elsethread.
> Silly me, I already used `split-string' 10 times...
C'm on. Wetware caches are like that. Mine too.
Cheers
- t
signature.asc
Description: Digital signature