help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Q on call-process and grep


From: Eric Pement
Subject: Re: Q on call-process and grep
Date: 22 Dec 2005 11:29:02 -0800
User-agent: G2/0.2

Drew Adams wrote:

> I'm using native Emacs on Windows, and I use Cygwin [ ... ]

> If I do this from the command line or using Emacs command `grep', it works
> fine:
>
>   grep -i ",someword\\($\\|,\\)" "myfile"
>
> If, however, I do this, then some words are found and others (which are also
> present in myfile) are not found:
>
>   (call-process "grep" nil buf nil "-i" ",someword\\($\\|,\\)" "myfile")
>
> The same words are systematically found or not found. I haven't been able to
> figure out why this doesn't work for some words (only).

If it were me, I'd make a copy of the file, and then chop it into
smaller pieces where I can illustrate the problem in a manageable
length (say, 10 or 20 lines, but the fewer the better). The sed command

   sed -n 17,35p bigfile >smallfile

will print lines 17 to 35, inclusive, so you can do your testing. But
you say that some of the lines are quite long. So try this:

   awk '{print length($0)}' smallfile

to see how long is too long. If the lines are under 4000 chars, I'd
feel safe in guessing that line length isn't a problem. If you have
lines 20,000 chars or more, then I'd start thinking about the input.

Does each line in the problem set end in a CR/LF? I've had datafiles
that gave me bad data because somehow some lines ended with CR/LF,
others with CR/CR/LF, and others with CR only. How I got the problem
isn't relevant. But to normalize the input, try

   tr -d '\r' <smallfile | sed -n p >clean_smallfile

which should remove any extraneous CRs which might be causing
corruption and restore the line endings to your Cygwin default (Unix or
DOS, whichever you picked).

[ ... ]
> All characters are ASCII, I believe (how to check that?).

   Use tr to delete all the characters that are permissible or
expected, and whatever is left must be an unexpected character. Examine
the output with cat -A or od or your tool of choice. E.g.,

   tr -d '\n\r\t\40-\176' <infile >outfile

If it were me, I might wonder about embedded backspaces or carriage
returns in the text. Just a thought. Good luck on your hunting!

--
Eric Pement



reply via email to

[Prev in Thread] Current Thread [Next in Thread]