background: if I have comma-separated data, such as
12345 , , , , , 12 , Data Street, Command Deck , Enterprise, Space, 17094
and i would like to use awk/gawk to split it into fields around commas and also strip leading/trailing whitespace, i can use the following FS and script:
BEGIN { FS="[ \t]*,[ \t]*"; }
{
for (i=1; i <= NF; i++) {
printf "%2d '%s'\n", i, $i;
}
}
to receive the following output:
1 '12345'
2 ''
3 ''
4 ''
5 ''
6 '12'
7 'Data Street'
8 'Command Deck'
9 'Enterprise'
10 'Space'
11 '17094'
the same works for any other normal character separator. except…
issue: if my data is instead pipe-separated, such as
12345 | | | | | 12 | Data Street| Command Deck | Enterprise| Space| 17094
using FS="|" works to split fields around the pipe character, but including the pipe in a regexp FS results in silent failure by AWK, non sensible warning "warning: escape sequence `\|' treated as plain `|'" and failure by GAWK:
BEGIN { FS="[ \t]*\|[ \t]*"; }
{
for (i=1; i <= NF; i++) {
printf "%2d '%s'\n", i, $i;
}
}
yields:
1 '12345'
2 '|'
3 '|'
4 '|'
5 '|'
6 '|'
7 '12'
8 '|'
9 'Data'
10 'Street|'
11 'Command'
12 'Deck'
13 '|'
14 'Enterprise|'
15 'Space|'
16 '17094'
expected behavior would be to treat '\|' as the character '|', identically to ',' or other characters, rather than stripping the escape and incorporating it into the FS regexp.
thx, n@