help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] Using PE to specify an array


From: Bruce Hohl
Subject: Re: [Help-bash] Using PE to specify an array
Date: Mon, 17 Sep 2018 15:32:00 -0400

Thanks for those comments and a clean answer.  I wasn't really all that
excited about my solution as it seemed overly complicated.  I was a bit
biased toward shoe horning in use of nameref feature.  In hindsight
creating a variable for every unique file hash seems more ridiculous now
than at the time :)  I understand your comments about the limits-of /
appropriate-use-of bash.  Just trying to kick my bask understanding up a
few steps.

On Mon, Sep 17, 2018 at 12:21 PM Greg Wooledge <address@hidden> wrote:

> On Mon, Sep 17, 2018 at 11:36:25AM -0400, Bruce Hohl wrote:
> > @Greg, it is an interesting happen-stance that you replied as my question
> > arose from my pass at completing your duplicate file finder "exercise" at
> > mywiki.wooledge.org/BashProgramming/04:  "If you want to "fix" this
> > "problem", you might suppress all the printing until the end, and then
> > iterate over the whole array and print only those values that contain a
> > newline. (This is left as an exercise.)"  So with your suggestion to use
> > nameref vars the following seems to work:
> >
> > === Duplicate file finder exercise === (NO comments)
> > #!/bin/bash
> > while read -r md5_hash file; do
> >   var_hash=md5_$md5_hash
> >   declare -n ind_var_hash=$var_hash
> >   [[ address@hidden -eq 1 ]] && declare -a dup_array+="($var_hash)"
> >   declare -a ${!ind_var_hash}+="('$file')"
> > done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {}
> +)
> >
> > declare -n e
> > for e in address@hidden; do
> >   echo ${!e}
> >   for f in address@hidden; do echo "  $f"; done
> > done
>
> So your approach was to experiment with bash commands until you found
> something that would approximate giving you the ability to have a hash
> of lists (associative array of indexed arrays).
>
> And what you came up with was using the entire bash variable namespace
> as your hash, and storing each list as a separate indexed array within
> that namespace.
>
> That's... definitely not how I would have done it. ;-)
>
> You're also missing some quotes.
>
> Anyway, here is the solution that I had in mind for that:
>
> =====================================================
> #!/bin/bash
> declare -A seen
> while read -r md5 file; do
>   if [[ ${seen[$md5]} ]]; then
>     seen[$md5]+=$'\n'$file
>   else
>     seen[$md5]=$file
>   fi
> done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)
>
> for i in "address@hidden"; do
>   if [[ ${seen[$i]} = *$'\n'* ]]; then
>     printf 'Matching MD5:\n%s\n\n' "${seen[$i]}"
>   fi
> done
> =====================================================
>
> The stuff I wrote in the text was really quite literal: "store multiple
> filenames for each MD5 value (in a newline-delimited pseudo-list)" and
> "iterate over the whole array and print only those values that contain
> a newline".  That's what I'm doing here.
>
> This is also a hack, using newlines to store multiple elements of a list
> in a string variable, and this only works because we're already excluding
> filenames that have a newline in them.  This frees up the newline character
> to act as a list delimiter.
>
> In the absence of that opening, I would simply have written the program
> in a different language -- one that allows you to create a hash of lists
> without needing special hacks and tricks.
>
> For example, a relatively straight conversion to Tcl:
>
> =====================================================
> #!/usr/bin/env tclsh
> if {[llength $argv]} {set start [lindex $argv 0]} else {set start .}
> foreach line [split \
>       [exec find $start -name "*\n*" -prune -o -type f -exec md5sum "{}"
> +] \
>       \n] {
>   set md5 [string range $line 0 31]
>   set file [string range $line 34 end]
>   lappend seen($md5) $file
> }
>
> foreach i [array names seen] {
>   if {[llength $seen($i)] < 2} continue
>   puts [format "Matching MD5: %s" [join $seen($i) { }]]
> }
> =====================================================
>
> The output format is slightly different, but of course that can
> be adjusted.  The elements of "seen" are simply lists of filenames,
> as this language supports this directly.  I'm sure a similar solution
> could be written in Python (which I don't know well enough to write in).
>
> The only reason this solution is excluding filenames with newlines is
> because of the md5sum command's output format.
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]