help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] Filename Expansion: Find Utility vs Bash Shell Pattern M


From: Michael Convey
Subject: Re: [Help-bash] Filename Expansion: Find Utility vs Bash Shell Pattern Matching
Date: Wed, 17 Jun 2015 09:18:10 -0700

​Stephane, yes, I read your very thorough answer at stackechange -- thank
you. You obviously understand how the various globbing/pattern matching
operators (i.e. *, ?, [], /, etc.)​ work in different circumstances (as
evidenced by your detailed examples). However, I'm trying to understand why
the operators work differently under different circumstances. For example,
according to the following link, 'find -name' appears to use fnmatch(),
whereas bash appears to use glob():
http://www.linuxselfhelp.com/gnu/glibc/html_chapter/libc_10.html
That is probably an oversimplification, but I'm still trying to wrap my
brain around this. Once I understand the underlying mechanism/function for
each use case, I'll be able to learn the defaults and options of those
underlying mechanisms, which will enhance my understanding of the big
picture. Thanks again!!

On Tue, Jun 16, 2015 at 11:27 PM, Stephane Chazelas <
address@hidden> wrote:

> 2015-06-16 14:41:41 -0700, Michael Convey:
> > Because my questions involve both bash and the find utility, ​I posted
> the
> > following to gnu's bug-findutils mailing list:
> > http://lists.gnu.org/archive/html/bug-findutils/2015-06/msg00005.html
> >
> > Eric Blake answered as follows:
> > http://lists.gnu.org/archive/html/bug-findutils/2015-06/msg00006.html​
> >
> > To provide context, please read my question at the first link above and
> > Eric's answer at the 2nd link above and help me with the following:
> [...]
>
> You also posted it at
>
> http://unix.stackexchange.com/questions/210036/filename-expansion-find-utility-pattern-matching-vs-bash-shell-pattern-matching
> apparently where I replied and answered most of those questions:
>
>
> In the shell, you need to distinguish filename generation/expansion (aka
> globbing): a pattern that expands to a list of files from pattern matching.
> globbing uses pattern matching internally, but it's really before all an
> operator to generate a list of files based on a pattern.
>
> */*.txt is a pattern which matches a sequence of 0 or more characters
> followed
> by / followed by a sequence of zero or more characters followed by .txt.
> When
> used as a shell pattern as in:
>
> case $file in
>   */*.txt) echo match
> esac
>
> It will match on file=.foo/bar/baz.txt.
>
> However */*.txt as a glob is something related but more complex.
>
> In expanding */*.txt into a list of files, the shell will open the current
> directory, list its content, find the non-hidden files of type directory
> (or
> symlink to directory) that match *, open each of those, list their content
> and
> find the non-hidden ones that match *.txt.
>
> It will never expand .foo/bar/bar.txt even though that matches the pattern
> because that's not how it works. On the other hand, the file paths
> generated by
> a glob will all match that pattern.
>
> Similarly, a glob like foo[a/b]baz* will find all the file whose name
> starts
> with b]baz in the foo[a directory.
>
> So, we've seen alread that for globbing, but not for pattern matching, / is
> special (globs are somehow split on / and each part treated separately) and
> dot-files are treated specially.
>
> Shell globbing and pattern matching are part of the shell syntax. It's
> intertwined with quoting and other forms of expansion.
>
> $ bash -c 'case "]" in [x"]"]) echo true; esac'
> true
>
> Quoting that ] removes its special meaning (of closing the previous [):
>
> It can even quite confused when you mix everything:
>
> $ ls
> *  \*  \a  x
>
> $ p='\*' ksh -xc 'ls $p'
> + ls '\*' '\a'
> \*  \a
>
> OK \* is all the files starting with \.
>
> $ p='\*' bash -xc 'ls $p'
> + ls '\*'
> \*
>
> It's not all the files starting with \. So, somehow, \ must have escaped
> the *,
> but then again it's not matching * either...
>
> For find, it's a lot simpler find descends the directory tree at each of
> the
> file argument it receives and then do the tests as instructed for each
> encountered file.
>
> For -type f, that's true if the file is a regular file, false otherwise for
> -name <some-pattern>, that's true if the name of the currently considered
> file
> matches the pattern, false otherwise. There's no concept of hidden file or
> /
> handling or shell quoting here, that's just matching a string (the name of
> the
> file) against a pattern.
>
> So for instance, -name '*foo[a/b]ar' (which passes -name and *foo[a/b]ar
> arguments to find) will match foobar and .fooaar. It will never match
> foo/bar,
> but that's because -name matches on the file name, it would with -path
> instead.
>
> Now, there is one form of quoting/escaping -- for find -- recognised here,
> and
> that's only with backslash. That allows to escape operators. For the shell,
> it's done as part of the usual shell quoting (\ is one of the shell's
> quoting
> mechanisms). For find (fnmatch()), that's part of the pattern syntax.
>
> For instance -name '\**' would match on files whose name starts with *.
> -name
> '*[\^x]*' would match on files whose name contains ^ or x...
>
> Now, as for the different operators recognised by find, fnmatch(), bash and
> various other shells, they should all agree at least on a common subset:
> *, ?
> and [...].
>
> Whether a particular shell or find implementation uses the system's
> fnmatch()
> function or their own is up to the implementation. GNU find does at least
> on
> GNU systems. Shells are very unlikely to use them as it would thing
> complicated
> for them and not worth the effort.
>
> bash certainly doesn't. Moddern shells like ksh, bash, zsh also have
> extensions
> over *, ?, [...] and a number of options and special paramters
> (GLOBIGNORE/FIGNORE) to affect their globbing behaviour.
>
> Now, their can be subtle differences between the pattern matching
> operators.
>
> For instance, for GNU fnmatch(), ?, * or [!x] would not match a byte or
> sequence of bytes that don't form a valid characters while bash (and most
> other
> shells) would. For instance, on a GNU system find . -name '*' may fail to
> match
> files whose name contains invalid characters, while bash -c 'echo *' will
> list
> them (as long as they don't start with .).
>
> We've mentionned already the confusion that can be incurred by quoting.
>
> --
> Stephane
>
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]