coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'Cat' feature request


From: Dragan Simic
Subject: Re: 'Cat' feature request
Date: Tue, 26 Dec 2023 09:03:50 +0100

On 2023-12-26 04:53, Kaz Kylheku wrote:
On 2023-12-25 16:58, Pádraig Brady wrote:
On 25/12/2023 21:25, Kaz Kylheku wrote:
On 2023-12-22 10:09, Evan Tremblay wrote:
so, if you run:
cat *

it wont run(unless you use --show-all).

Are scrambling to win a stupidest post of 2023 contest or something?

That is not an appropriate response.

I agree, nobody knows everything and we should aim toward helping each other.

Sorry, Christmas is a little touch-and-go so details fall by
the wayside. I now have a moment to explain why it deserves
being called stupid.

It is not because of the evaluation strategy of * being
expanded by the shell.

The feature is actually implementable. The cat program has a way of
determining that it has been passed all the names that may arise
from the expansion of *. (Modulo a minor sampling-related race
condition.) Namely, it can just call glob("*", ...) and compare
the results to its argument vector.

Well, files could actually be added or deleted between the expansion of the asterisk wildcard performed by the shell, and the globbing performed by cat(1), which would make such an approach rather unreliable. Relying on the "ah, it won't happen" approach isn't the way to go, if you agree.

It's a complete nonstarter because:

1. Unknown numbers of scripts out there depend on "cat *" just
   working. If an option is suddenly required to make that
   work, those scripts break. A legitimate use might look
   like, oh:

     # get the contents of all files in target_dir into file
     (cd $target_dir; cat * > $target_contents)

2. In relation to 1, such an option is incompatible with other
   implementations of cat (including prior versions of the
   same Coreutils one) and, importantly, with the POSIX standard.
   POSIX prohibits an implementation of cat from requiring
   an opt-in option in order for "cat *" to do what it is told.

It's possible to have it as an opt-in behavior, and that would
also eliminate the inefficiency (from regular use):
cat --not-all-files could do the expansion of *, and fail if
all the files in that expansion are present in the command line.

It's technically objectionable to include such hacks in the
core utilities.

Presumably, this protects the interactive user from accidentally
catting all the files in a large directory.

A protective feature in the interactive environment can be
obtained by writing a shell function in Bash.

Shell functions in people's personal, private environments *can*
be ugly hacks. All that matters is whether they are acceptable
to that individual.

Here is a basic crack at it:

cat()
{
  local -a orig_args=("$@")
  local -a star_files=(*)
  local all_present=y
  local occurs_not
  local i
  local j

  # Crudely skip arguments that look like options
  while true; do
    case "$1" in
      -* | --* ) shift ;;
      * ) break ;;
    esac
  done

  # get remaining args into args array
  local -a args=("$@")

  # determine whether all files in star_files are in orig_args
  for i in "${star_files[@]}"; do
    occurs_not=y
    for j in "${args[@]}"; do
      if [ "$i" = "$j" ]; then
        occurs_not=
      fi
    done
    if [ $occurs_not ]; then
      all_present=
      break
    fi
  done

  if [ $all_present ] ; then
    echo "cat: all files that match * present in command line!"
    return 1
  fi

  command cat "${orig_args[@]}"
}

This has O(M * N) behavior, in the number of file arguments M and
number of files that match the * pattern, N.

This is kind of a "feature". When you do "cat file.txt", you hardly notice
a slowdown; it doesn't take much time to compare that one file
to thousands of files.

But when you type "cat *" in a large directory, it will just sit there
for a while, and that alone alerts the user that the very thing they
are trying to avoid has happened. Rather than waiting for the inevitable
diagnostic, they can just hit Ctrl-C.

There are ways to speed it up by taking advantage of the contents of
the * expansions being sorted. We change the requirements to this:
we look for situations when the command line contains, as a contiguous
subsequence, the sequence produced by *.

If Bash had macros, we could do things like this in better ways.
Imagine we could do this:

  macro cat()
  {
  }

where "cat * *.txt $X" receives three arguments that are literally '*',
'*.txt' and '$X', with no expansion having taken place, only a division
into fields. It could then check that '*' does not occur, and
then use eval to execute command cat "$@".



reply via email to

[Prev in Thread] Current Thread [Next in Thread]