[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: puzzled by the result of this multi-dimensional array code
From: |
Neil R. Ormos |
Subject: |
Re: puzzled by the result of this multi-dimensional array code |
Date: |
Sat, 20 Apr 2024 21:31:27 -0500 (CDT) |
User-agent: |
Alpine 2.20 (DEB 67 2015-01-07) |
Peter Lindgren wrote:
> I'm trying to write a gawk program to analyze some simple csv data -
> values for 6 variables per 100+ countries. So far, I'm just reading
> in the data and loading it into a 2-dimensional array, called
> "data". The first data line consists of column titles; I split that
> into a separate array called "titles". As a check on what I'm doing
> so far, I chose to dump or show the USA data, and it comes out
> oddly. I was able to cut down the data to two lines - the column
> titles line and the data line for the USA. I changed the actual data
> values, since they're immaterial; nothing secret though - the source
> of the original data is noted in the code and it is publicly
> available on the web.
> test data:
>
> abr;full;item1;item2;item3;item4;item5;item6
> USA;U.S.A.;24;1;27;75;60;19
>
> code:
>
> #!/usr/bin/env -S gawk -f
>
> # program to analyze cultural data from Geert Hofstede's
> # see the Culture's Consequences book or:
> # http://www.geerthofstede.nl
>
> BEGIN {FS=";"}
>
> NR==1 {split($0,titles)
> for (x in titles) { print x, titles[x] }
> next
> }
>
> # load the data
> { for (i=1; i<=NF; i++) {
> data[$1][titles[i]] = $i
> }
> }
>
> # process the data
>
>
>
> END { show("USA")
> for (x in titles) { print x, titles[x] }
> }
>
> function show(c) {
> for (x in titles) { print x, titles[x], data[c][titles[x]] }
> print 8, titles[8],data[c][titles[8]]
> }
>
>
> result of ./countries.awk test.csv
> 1 abr
> 2 full
> 3 item1
> 4 item2
> 5 item3
> 6 item4
> 7 item5
> 8 item6
> 1 abr USA
> 2 full U.S.A.
> 3 item1 24
> 4 item2 1
> 5 item3 27
> 6 item4 75
> 7 item5 60
> 19tem6
> 19tem6
> 1 abr
> 2 full
> 3 item1
> 4 item2
> 5 item3
> 6 item4
> 7 item5
> 8 item6
>
> Question: why doesn't the title for the 8th
> column get printed like the others do, when I'm
> showing a real data line? The titles array
> itself shows up correctly just after being
> saved, and again at END. Hard-coding an 8 in the
> show function doesn't make any difference.
> I am totally ready to believe this is something
> obvious I'm just missing...
Testing with gawk 4.1.4 and gawk 5.2.2, I get the
results below, where the 8th column seems to print
like the other columns, but I had to strip out
many special characters (they looked like
non-breaking spaces) to get the program to run.
The special characters may be artifacts of the
e-mail MUA through which you sent your question,
but perhaps the special characters are in the
source file or the input data file. For
debugging, you might consider:
(a) using the C locale;
(b) checking the program and the input data for
characters other than printable ASCII and
newlines; and
(c) using the --profile option to confirm that the
parsed version of the program looks as you
expect.
################################################################################
1 abr
2 full
3 item1
4 item2
5 item3
6 item4
7 item5
8 item6
1 abr USA
2 full U.S.A.
3 item1 24
4 item2 1
5 item3 27
6 item4 75
7 item5 60
8 item6 19
8 item6 19
1 abr
2 full
3 item1
4 item2
5 item3
6 item4
7 item5
8 item6
################################################################################
- Re: puzzled by the result of this multi-dimensional array code,
Neil R. Ormos <=