[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] Using PE to specify an array
From: |
Bruce Hohl |
Subject: |
Re: [Help-bash] Using PE to specify an array |
Date: |
Mon, 17 Sep 2018 15:32:00 -0400 |
Thanks for those comments and a clean answer. I wasn't really all that
excited about my solution as it seemed overly complicated. I was a bit
biased toward shoe horning in use of nameref feature. In hindsight
creating a variable for every unique file hash seems more ridiculous now
than at the time :) I understand your comments about the limits-of /
appropriate-use-of bash. Just trying to kick my bask understanding up a
few steps.
On Mon, Sep 17, 2018 at 12:21 PM Greg Wooledge <address@hidden> wrote:
> On Mon, Sep 17, 2018 at 11:36:25AM -0400, Bruce Hohl wrote:
> > @Greg, it is an interesting happen-stance that you replied as my question
> > arose from my pass at completing your duplicate file finder "exercise" at
> > mywiki.wooledge.org/BashProgramming/04: "If you want to "fix" this
> > "problem", you might suppress all the printing until the end, and then
> > iterate over the whole array and print only those values that contain a
> > newline. (This is left as an exercise.)" So with your suggestion to use
> > nameref vars the following seems to work:
> >
> > === Duplicate file finder exercise === (NO comments)
> > #!/bin/bash
> > while read -r md5_hash file; do
> > var_hash=md5_$md5_hash
> > declare -n ind_var_hash=$var_hash
> > [[ address@hidden -eq 1 ]] && declare -a dup_array+="($var_hash)"
> > declare -a ${!ind_var_hash}+="('$file')"
> > done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {}
> +)
> >
> > declare -n e
> > for e in address@hidden; do
> > echo ${!e}
> > for f in address@hidden; do echo " $f"; done
> > done
>
> So your approach was to experiment with bash commands until you found
> something that would approximate giving you the ability to have a hash
> of lists (associative array of indexed arrays).
>
> And what you came up with was using the entire bash variable namespace
> as your hash, and storing each list as a separate indexed array within
> that namespace.
>
> That's... definitely not how I would have done it. ;-)
>
> You're also missing some quotes.
>
> Anyway, here is the solution that I had in mind for that:
>
> =====================================================
> #!/bin/bash
> declare -A seen
> while read -r md5 file; do
> if [[ ${seen[$md5]} ]]; then
> seen[$md5]+=$'\n'$file
> else
> seen[$md5]=$file
> fi
> done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)
>
> for i in "address@hidden"; do
> if [[ ${seen[$i]} = *$'\n'* ]]; then
> printf 'Matching MD5:\n%s\n\n' "${seen[$i]}"
> fi
> done
> =====================================================
>
> The stuff I wrote in the text was really quite literal: "store multiple
> filenames for each MD5 value (in a newline-delimited pseudo-list)" and
> "iterate over the whole array and print only those values that contain
> a newline". That's what I'm doing here.
>
> This is also a hack, using newlines to store multiple elements of a list
> in a string variable, and this only works because we're already excluding
> filenames that have a newline in them. This frees up the newline character
> to act as a list delimiter.
>
> In the absence of that opening, I would simply have written the program
> in a different language -- one that allows you to create a hash of lists
> without needing special hacks and tricks.
>
> For example, a relatively straight conversion to Tcl:
>
> =====================================================
> #!/usr/bin/env tclsh
> if {[llength $argv]} {set start [lindex $argv 0]} else {set start .}
> foreach line [split \
> [exec find $start -name "*\n*" -prune -o -type f -exec md5sum "{}"
> +] \
> \n] {
> set md5 [string range $line 0 31]
> set file [string range $line 34 end]
> lappend seen($md5) $file
> }
>
> foreach i [array names seen] {
> if {[llength $seen($i)] < 2} continue
> puts [format "Matching MD5: %s" [join $seen($i) { }]]
> }
> =====================================================
>
> The output format is slightly different, but of course that can
> be adjusted. The elements of "seen" are simply lists of filenames,
> as this language supports this directly. I'm sure a similar solution
> could be written in Python (which I don't know well enough to write in).
>
> The only reason this solution is excluding filenames with newlines is
> because of the md5sum command's output format.
>
>
Re: [Help-bash] Using PE to specify an array, Greg Wooledge, 2018/09/14