nano-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lines and regexp


From: Mike Scalora
Subject: lines and regexp
Date: Wed, 7 Sep 2022 10:57:47 -0600

I noticed something curious about lines and regexp search and replace in nano. If I open a file with two lines which nano reports the number of lines on open and search and "replace all" on the regexp $ (or ^), I get 3 replacements. It doesn't matter if the second line in the file has a newline or not, I get the same result because nano adds the second newline automatically. This contradicts nano's reporting of the number of lines read and adds yet another line to the file.

The cli programs wc, egrep, sed and perl all agree that there are two lines, regexp search & replace in python and php think there are 3 lines. In other testing I've done golang & node/_javascript_ are in the 2 line camp, C#/.net and Java are in the 3 line camp.

Like nano, python3 has an inconsistency, the splitlines() method thinks there are two lines and the re (regexp) module thinks there are 3.

I think by POSIX definition of a line, there are 2 and GNU wc, egrep and sed agree, but the GNU C Library regex library disagrees. The Mac's cli tools all agree with GNU. I've come to understand that regexp library authors have an opinion about what constitutes a line, not sure if it is conscious or unconscious but it doesn't always match what the application thinks of a line. Python propagates its own internal contradiction into apps that are implemented in it. PHP has the same inconsistency between the preg_ apis and the line-by-line io apis but who really expects PHP to not have inconsistencies.

CLI Demo transcript on Linux: (I get identical results on a Mac)

  ==$ echo 'apple' >/tmp/test.txt

  ==$ echo 'peach' >>/tmp/test.txt

  ==$ wc -l /tmp/test.txt
  2 /tmp/test.txt

  ==$ egrep -n '$' /tmp/test.txt
  1:apple
  2:peach

  ==$ sed 's/$/ pie/' /tmp/test.txt
  apple pie
  peach pie

  ==$ cat /tmp/test.txt | perl -pe's/$/ pie/'
  apple pie
  peach pie

  ==$ python3 -c $'import re,sys\nwith open("/tmp/test.txt") as f: sys.stdout.write(f"{len(f.read().splitlines())} lines")'
  2 lines
  ==$ python3 -c $'import re,sys\nwith open("/tmp/test.txt") as f: sys.stdout.write(re.sub(r"$"," pie",f.read(),0,re.M))'
  apple pie
  peach pie
   pie
  ==$ php -r 'echo preg_replace("/$$/m"," pie",file_get_contents("/tmp/test.txt"));'
  apple pie
  peach pie
   pie
  ==$ nano /tmp/test.txt
  ^w^r$<return><space>pie<return>a^s^x
  ==$ cat /tmp/test.txt
  apple pie
  peach pie
   pie

  ==$

Idea: Remove the trailing newline during the regexp operation.

At least in simple cases, this makes the search & replace better meet my expectations. I didn't see a way to actually search for a newline (https://stackoverflow.com/questions/25959610/nano-insert-newline-in-search-and-replace didn't help) and if you can, are there really expectations that would be broken. I tried \z & \Z but they do not appear to be supported in nano (GNU regexp library?).

I didn't look at the nano code, I thought I would see if anyone cares enough to even discuss the topic first.

-Mike


reply via email to

[Prev in Thread] Current Thread [Next in Thread]