[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A bug in "tr" command ???
From: |
Bob Proulx |
Subject: |
Re: A bug in "tr" command ??? |
Date: |
Sat, 4 Oct 2003 19:23:23 -0600 |
User-agent: |
Mutt/1.3.28i |
address@hidden wrote:
> Hello. I hope this is not a bug, that I'm just doing something
> wrong. Anyway, here's how this "journey" started. I had a file
> with carriage return characters (^M) in it.
DOS has a CR-NL end of line convention. UNIX has a NL end of line
convention. A classic conversion problem.
> The file was one LONG record, and I wanted newline characters where
> the ^M's were. I thought I could just set awk's RS variable to ^M,
> and that would do it. But, I needed a way to "create" the ^M
> character.
I would use other commands myself. Such as tr -d. Try this:
tr -d "\015"
But please continue with your story.
> Somewhere on the Internet I found this:
>
> cm=`echo m | tr 'm' '\015'`
>
> but, that did not seem to work. Seemed like "cm" ended up being
> null.
Works for me!
> To test if the syntax of the command correct, I did the following:
>
> Script 1 (a file called ASCII):
>
> cat /dev/null > asc
> cat /dev/null > asc.txt
The 'cat' programs in the above are not needed.
true > asc
or
: > asc
or
> asc
All do the same thing without the extra program. (Sorry, but extra
'cat' processes are a common scripting mistake and a pet peeve of
mine.) Here is a simple howto on common shell mistakes.
http://www.greenend.org.uk/rjk/2001/04/shell.html
> for i in 000 001 002 003 004 005 006 007 \
> 010 011 012 013 014 015 016 017 \
> 020 021 022 023 024 025 026 027 \
> 030 301 032
> do
> echo "x=\`echo x | tr 'x' '\\${i}'\`" >> asc
> echo "echo \"\${x}\" >>asc.txt" >>asc
> done
>
> bash asc
>
> The execution of script 1 (ASCII) created a file called asc
> (which was executed from within the ASCII file).
>
> asc file:
>
> x=`echo x | tr 'x' '\000'
> if [[ "${x}" == "" ]]; then echo "x is null"; else echo "x is not null"; fi
> #Note: the above "if" statement was not created by the script. I edited it
> in afterwards,
You have to be careful that your editor does not change any of the
characters. In particular some editors will silently delete null (000)
characters.
> #and re-executed the asc file manually
> echo "${x}" >>asc.txt
> x=`echo x | tr 'x' '\001'
> if [[ "${x}" == "" ]]; then echo "x is null"; else echo "x is not null"; fi
> #Same for that "if" statement, too
> echo "${x}" >>asc.txt
> x=`echo x | tr 'x' '\002'
> echo "${x}" >>asc.txt
> .
> . (several lines left out for brevity)
> .
> x=`echo x | tr 'x' '\031'
> echo "${x}" >>asc.txt
> x=`echo x | tr 'x' '\032'
> echo "${x}" >>asc.txt
>
> And, the result of that execution was a file, asc.txt
> (this is how it looked when viewed with vi):
>
> (a null character, OK, i.e., expected)
> ^A
> ^B
> ^C
> ^D
> ^E
> ^F
> ^G
> ^H
> (a tab character, OK, i.e., expected)
> (a null character, not expected)
I can't recreate that. I don't see the null.
> ^K
> ^L
> (a null character, not expected, at least I had hoped it would be a ^M)
I can't recreate that. I don't see the null.
What version of tr are you using?
tr --version
> ^N
> .
> .(several lines left out for brevity)
> .
> ^Z
>
> Note 1: where ^I would be is a tab character (OK)
> where ^J would be is a null character
> where ^M would be is a null character
>
> Note 2: I went back and edited in the following line to the
> 2nd file (asc):
>
> if [[ "${x}" == "" ]]; then echo "x is null"; else echo "x is not null"; fi
>
> and inserted it after the "000", "001", "012", "013", "015", and "016" lines
> to test. The character created by the "012", and "015" lines from the asc
> file is null. :(
You can probably do this easier with:
echo x | tr x "\\015" | od -c
or even
for I in $(seq -w 0 32);do echo x | tr x "\\$i" | od -c;done
> Note 3: GNU bash, version 2.05.0(8)-release (i686-pc-cygwin)
Cygwin? You are probably running afoul of the DOS end of line
conventions. Probably the program is doing its own conversions. Can
you recreate this on a UN*X like machine? I don't think anyone on
this list uses Cygwin. So if it is a Cygwin specific problem then you
would need to take this to the cygwin list.
> Note 4: I finally just made a copy of a file that had ^M's in it,
> edited out everything but one ^M character, and then edited the
> following around the ^M:
>
> BEGIN { RS = "^M" }
> { print }
>
> and then used that to process my file with the ^M's in it:
>
> cat ctrl-Ms_file | awk -f RS_is_ctrl-M.awk > newlines_file
Try 'tr -d "\015"' as the classic way to delete CRs from files.
> Yucky thing is that I would have to keep that "RS_is_crtl-M.awk"
> file around (or create it as needed using vi) since I can't create a
> ^M character "on the fly". :(
Sure you can!
tr -d "\015"
printf "\r"
tr -d "$(printf "\r")"
CR=$(prinf "\r")
tr -d "$CR"
I think 'perl -l' might rethread end of line conventions too. Not
sure. I don't have a way to test this on DOS. But it is worth a test
on Cygwin.
perl -lne 'print'
Bob