help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash -x doesn't always show "exec"


From: John Hanks
Subject: Re: bash -x doesn't always show "exec"
Date: Wed, 24 Mar 2021 05:46:32 -0500

I tried dumping additional info and attaching with gdb to see differences
at a lower level. Still no difference in the env and set output that might
explain this, but attaching with gdb changed the behavior somehow. The exec
now appears in the "bash -x" output in both, but in the Slurm session the
downstream tests still fail to run so something is still differing between
these.

As an aside, these machines are as identical as possible. Servers have
identical hardware, both PXE boot from the same golden image and run the OS
from in memory. Both mount the same filesystems via NFS.

To the script I added:

set > ~/set.$(hostname -s)
env | sort > ~/env.$(hostname -s)
echo "gdb -p $$ /bin/bash"
read
exec bats-exec-suite "${flags[@]}" "${filenames[@]}" |
bats_test_count_validator | "bats-format-${formatter}"
"${formatter_flags[@]}"

>From within slurm job:

+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L213:  -
[3,0,0][[ -d
/hpc/projects/stv/mFish/UAT/dva_mfish/mfish/etl/tests/imports.bats ]]
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L226:  -
[3,0,1]filenames+=("$filename")
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L231:  -
[3,0,0]source /home/prd_rdip_mfish/local/lib/bats-core/validator.bash
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L233:  -
[3,0,0]set -o pipefail execfail
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:  -
[3,0,0]set
++ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:
 - [3,1,0]hostname -s
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L235:  -
[3,0,0]sort
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L235:  -
[3,0,0]env
++ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L235:
 - [3,1,0]hostname -s
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L236:  -
[3,0,0]echo 'gdb -p 129624 /bin/bash'
gdb -p 129624 /bin/bash
+ 10:21:45 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L237:  -
[3,0,0]read

+ 10:24:01 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L238:  -
[3,0,0]exec bats-exec-suite -x
/hpc/projects/stv/mFish/UAT/dva_mfish/mfish/etl/tests/imports.bats
+ 10:24:01 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L238:  -
[3,0,0]bats_test_count_validator

And similar flow through the normal login session:

+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L213:  -
[3,0,0][[ -d
/hpc/projects/stv/mFish/UAT/dva_mfish/mfish/etl/tests/imports.bats ]]
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L226:  -
[3,0,1]filenames+=("$filename")
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L231:  -
[3,0,0]source /home/prd_rdip_mfish/local/lib/bats-core/validator.bash
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L233:  -
[3,0,0]set -o pipefail execfail
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:  -
[3,0,0]set
++ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:
 - [3,1,0]hostname -s
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L235:  -
[3,0,0]env
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L235:  -
[3,0,0]sort
++ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L235:
 - [3,1,0]hostname -s
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L236:  -
[3,0,0]echo 'gdb -p 6035 /bin/bash'
gdb -p 6035 /bin/bash
+ 10:24:09 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L237:  -
[3,0,0]read

+ 10:24:47 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L238:  -
[3,0,0]exec bats-exec-suite -x
/hpc/projects/stv/mFish/UAT/dva_mfish/mfish/etl/tests/imports.bats
+ 10:24:47 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L238:  -
[3,0,0]bats_test_count_validator


Stepping through with gdb in the Slurm session:

(gdb) n
Single stepping until exit from function __read_nocancel,
which has no line number information.
0x0000000000483f30 in zread ()
(gdb) n
Single stepping until exit from function zread,
which has no line number information.
0x0000000000484001 in zreadc ()
(gdb) n
Single stepping until exit from function zreadc,
which has no line number information.
0x0000000000475611 in read_builtin ()
(gdb)
Single stepping until exit from function read_builtin,
which has no line number information.
0x000000000042f35f in execute_builtin.isra.2 ()
(gdb)
Single stepping until exit from function execute_builtin.isra.2,
which has no line number information.
0x0000000000431449 in execute_simple_command ()
(gdb)
Single stepping until exit from function execute_simple_command,
which has no line number information.
0x0000000000432353 in execute_command_internal ()
(gdb)
Single stepping until exit from function execute_command_internal,
which has no line number information.
0x0000000000433d3e in execute_command ()
(gdb)
Single stepping until exit from function execute_command,
which has no line number information.
0x000000000041e375 in reader_loop ()
(gdb)
Single stepping until exit from function reader_loop,
which has no line number information.
[Detaching after fork from child process 132894]
[Detaching after fork from child process 132895]
[Detaching after fork from child process 132896]
[Detaching after fork from child process 132956]
0x000000000041e2bf in reader_loop ()


Compared to stepping through in the normal login session:

Single stepping until exit from function __read_nocancel,
which has no line number information.
0x0000000000483f30 in zread ()
(gdb)
Single stepping until exit from function zread,
which has no line number information.
0x0000000000484001 in zreadc ()
(gdb)
Single stepping until exit from function zreadc,
which has no line number information.
0x0000000000475611 in read_builtin ()
(gdb)
Single stepping until exit from function read_builtin,
which has no line number information.
0x000000000042f35f in execute_builtin.isra.2 ()
(gdb)
Single stepping until exit from function execute_builtin.isra.2,
which has no line number information.
0x0000000000431449 in execute_simple_command ()
(gdb)
Single stepping until exit from function execute_simple_command,
which has no line number information.
0x0000000000432353 in execute_command_internal ()
(gdb)
Single stepping until exit from function execute_command_internal,
which has no line number information.
0x0000000000433d3e in execute_command ()
(gdb)
Single stepping until exit from function execute_command,
which has no line number information.
0x000000000041e375 in reader_loop ()
(gdb)
Single stepping until exit from function reader_loop,
which has no line number information.
[Detaching after fork from child process 6967]
[Detaching after fork from child process 6968]
[Detaching after fork from child process 6969]
0x000000000041c9de in main ()

On Tue, Mar 23, 2021 at 6:00 PM John Hanks <griznog@gmail.com> wrote:

> Hi,
>
> I'm troubleshooting some bash scripts and have an issue where when ran
> from a normal ssh session the scripts work fine, but when ran from with a
> Slurm managed job they fail. I traced this down to a point in the code
> where they fork directions at runtime. The script has at the end:
>
> set -o pipefail execfail
> exec bats-exec-suite "${flags[@]}" "${filenames[@]}" |
> bats_test_count_validator | "bats-format-${formatter}"
> "${formatter_flags[@]}"
>
> When executed on a vanilla ssh session connection it works and does this:
>
> + 22:38:48 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L213:
>  - [3,0,0][[ -d
> /hpc/projects/stv/mFish/UAT/dva_mfish/mfish/etl/tests/imports.bats ]]
> + 22:38:48 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L226:
>  - [3,0,1]filenames+=("$filename")
> + 22:38:48 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L231:
>  - [3,0,0]source /home/prd_rdip_mfish/local/lib/bats-core/validator.bash
> + 22:38:48 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L233:
>  - [3,0,0]set -o pipefail execfail
> + 22:38:48 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:
>  - [3,0,0]exec bats-exec-suite -x
> /hpc/projects/stv/mFish/UAT/dva_mfish/mfish/etl/tests/imports.bats
> + 22:38:48 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:
>  - [3,0,0]bats-format-pretty
> + 22:38:48 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:
>  - [3,0,0]bats_test_count_validator
>
> But within a slurm job (whether interactive or a batch job, think with a
> pty or without one, either way same failure):
>
> + 22:37:17 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L213:
>  - [3,0,0][[ -d
> /hpc/projects/stv/mFish/UAT/dva_mfish/mfish/etl/tests/imports.bats ]]
> + 22:37:17 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L226:
>  - [3,0,1]filenames+=("$filename")
> + 22:37:17 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L231:
>  - [3,0,0]source /home/prd_rdip_mfish/local/lib/bats-core/validator.bash
> + 22:37:17 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L233:
>  - [3,0,0]set -o pipefail execfail
> + 22:37:17 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:
>  - [3,0,0]bats-format-pretty
> + 22:37:17 /home/prd_rdip_mfish/local/libexec/bats-core/bats.debug:L234:
>  - [3,0,0]bats_test_count_validator
>
> I've looked at diffs of "env" and "set" between these sessions, can't find
> any significant difference between them. All files referenced are
> available. shopt settings are identical on both. I can't find any reason
> they behave differently at this point, but they do.
>
> Any answers, advice or pointers to where I should be looking are
> appreciated. I'm really confused as to how "bash -x" output can differ for
> these lines of code. The Google queries I've come up with return a lot of
> hits, but the relevant terms are just too generic to narrow things down to
> a useful set.
>
> Best,
>
> griznog
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]