bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ptx -i ignore_file doesn't appear to work


From: Graig McHendrie
Subject: Re: ptx -i ignore_file doesn't appear to work
Date: Mon, 10 Apr 2006 08:33:55 -0700
User-agent: Thunderbird 1.5 (Windows/20051201)

Eric -

I've looked at the if.txt file with FileSnoop, and there are CR/LF (hex 0D/0A, oct 015/012) between each word in the file.

Graig


Eric Blake wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Paul Eggert on 4/7/2006 2:31 PM:
Using a command such as "ptx -i if.txt -r -f MS_OS_clips.txt >
out.txt", where "if.txt" is a file with one word to a line, such as:
a
and
the
with
produces permuted output that includes "a", "and", "the", and "with"
as key words.
Hmm, I don't observe this problem with coreutils 5.94:

$ cat if.txt
a
and
the
with
$ cat MS_OS_clips.txt a and the with hooboy
$ ptx -i if.txt -r -f MS_OS_clips.txt
a                      and the with   hooboy
$ ptx --version | sed 1q
ptx (GNU coreutils) 5.94

The problem is one of line endings.  Email is not a very good conveyance
of the problem, but if if.txt has CRLF endings, and is read in binary
mode, then ptx treats "a\r" as the keyword instead of "a", and since the
input file did not have any instances of "a\r", it was permuting every
word.  Paul, would you accept a patch to ptx that ignores \r in the ignore
file, so that files created on platforms with CRLF endings can be used
without modification when read in binary mode?

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEOluP84KuGfSFAYARAovLAJ97OKbqIn2S4W49Q1eUgYhwwqZ4MwCeM2aM
5IjaE0yTFFp4c8SfKxhx25Y=
=Pjsd
-----END PGP SIGNATURE-----





reply via email to

[Prev in Thread] Current Thread [Next in Thread]