help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "split-sentences"?


From: tomas
Subject: Re: "split-sentences"?
Date: Sat, 23 Jan 2021 14:10:56 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Sat, Jan 23, 2021 at 10:35:51AM +0100, moasenwood--- via Users list for the 
GNU Emacs text editor wrote:
> tomas wrote:
> 
> > Not exactly your result, but this comes close:

[...]

> > You can adjust the results by tweaking the regexp (try word
> > boundaries like '\<' and '\>'
> 
> *scratches my head*

A candidate for a sentence boundary is a word boundary
(plus some other conditions). This was at least my thought
process leading to that suggestion. It might be a bad
suggestion, though.

> > if you want to keep punctuation) or the other split-string's
> > optional params (e.g. drop the empty matches, etc.).
> 
> Well, that's a start, for sure. Thanks :)

You're welcome. Note that [:punct:] may be too broad a category:
does a sentence end with a comma? A semi-colon? A colon? What
about question and exclamation marks? What about the latter in
a language like Spanish, where they're parenthetical: "Ella
me preguntó ¿qué quieres?" (the parenthetical things make it
much easier to embed a question or an exclamation into something
else).

As always, the really interesting questions are left as exercises to
the reader... until you end with Natural Language Processing :-)

Possibly this is the danger Tomas Hlavaty is hinting at elsethread.

> Silly me, I already used `split-string' 10 times...

C'm on. Wetware caches are like that. Mine too.

Cheers
 - t

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]