[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] Using PE to specify an array
From: |
Greg Wooledge |
Subject: |
Re: [Help-bash] Using PE to specify an array |
Date: |
Mon, 17 Sep 2018 12:19:41 -0400 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
On Mon, Sep 17, 2018 at 11:36:25AM -0400, Bruce Hohl wrote:
> @Greg, it is an interesting happen-stance that you replied as my question
> arose from my pass at completing your duplicate file finder "exercise" at
> mywiki.wooledge.org/BashProgramming/04: "If you want to "fix" this
> "problem", you might suppress all the printing until the end, and then
> iterate over the whole array and print only those values that contain a
> newline. (This is left as an exercise.)" So with your suggestion to use
> nameref vars the following seems to work:
>
> === Duplicate file finder exercise === (NO comments)
> #!/bin/bash
> while read -r md5_hash file; do
> var_hash=md5_$md5_hash
> declare -n ind_var_hash=$var_hash
> [[ address@hidden -eq 1 ]] && declare -a dup_array+="($var_hash)"
> declare -a ${!ind_var_hash}+="('$file')"
> done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)
>
> declare -n e
> for e in address@hidden; do
> echo ${!e}
> for f in address@hidden; do echo " $f"; done
> done
So your approach was to experiment with bash commands until you found
something that would approximate giving you the ability to have a hash
of lists (associative array of indexed arrays).
And what you came up with was using the entire bash variable namespace
as your hash, and storing each list as a separate indexed array within
that namespace.
That's... definitely not how I would have done it. ;-)
You're also missing some quotes.
Anyway, here is the solution that I had in mind for that:
=====================================================
#!/bin/bash
declare -A seen
while read -r md5 file; do
if [[ ${seen[$md5]} ]]; then
seen[$md5]+=$'\n'$file
else
seen[$md5]=$file
fi
done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)
for i in "address@hidden"; do
if [[ ${seen[$i]} = *$'\n'* ]]; then
printf 'Matching MD5:\n%s\n\n' "${seen[$i]}"
fi
done
=====================================================
The stuff I wrote in the text was really quite literal: "store multiple
filenames for each MD5 value (in a newline-delimited pseudo-list)" and
"iterate over the whole array and print only those values that contain
a newline". That's what I'm doing here.
This is also a hack, using newlines to store multiple elements of a list
in a string variable, and this only works because we're already excluding
filenames that have a newline in them. This frees up the newline character
to act as a list delimiter.
In the absence of that opening, I would simply have written the program
in a different language -- one that allows you to create a hash of lists
without needing special hacks and tricks.
For example, a relatively straight conversion to Tcl:
=====================================================
#!/usr/bin/env tclsh
if {[llength $argv]} {set start [lindex $argv 0]} else {set start .}
foreach line [split \
[exec find $start -name "*\n*" -prune -o -type f -exec md5sum "{}" +] \
\n] {
set md5 [string range $line 0 31]
set file [string range $line 34 end]
lappend seen($md5) $file
}
foreach i [array names seen] {
if {[llength $seen($i)] < 2} continue
puts [format "Matching MD5: %s" [join $seen($i) { }]]
}
=====================================================
The output format is slightly different, but of course that can
be adjusted. The elements of "seen" are simply lists of filenames,
as this language supports this directly. I'm sure a similar solution
could be written in Python (which I don't know well enough to write in).
The only reason this solution is excluding filenames with newlines is
because of the md5sum command's output format.
Re: [Help-bash] Using PE to specify an array, Greg Wooledge, 2018/09/14