help-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Make on Linux Feeding All Commands Through ksh


From: Steve Waltner
Subject: Re: GNU Make on Linux Feeding All Commands Through ksh
Date: Thu, 4 Dec 2008 10:05:55 -0600

After two months, I'm finally looking into this issue again. Gotta get it working by the end of the year since migrating builds to Linux (more specifically the faster x86 hardware) is one of my business objectives for the year... :-)

Thanks to both Philip and David for their original responses. I've done some more digging and found the following information:

- Philip was correct in that both platforms were running all commands through /bin/ksh, even though it didn't look like that was the case on Solaris. Solaris does in fact use the exec(2) system call when called with the -c option. This means the "difference" between Solaris and Linux doesn't appear to be a significant difference.

- I am running a stock version of GNU make 3.81 on both platforms that I personally compiled from the same source code download. There shouldn't be anything funky about differences between the copies of GNU make, especially from the possibility of RedHat making changes to the source code before compiling and including it in their Linux distribution.

- Per David's commend about wether the "SHELL = /bin/ksh" definition was actually necessary or not, I did try removing that from the makefiles, but still got the same result of not "enough" simultaneous jobs running for the build while running on Linux. This did make the tree of processes on Linux look like "gmake -> gmake -> gmake ->..." intsead of "gmake -> ksh -> gmake -> ksh -> gmake ->...", but since this didn't change the behavior of the root issue on Linux, I removed that change from the makefiles. I do remember the developer that did most of the work on the makefiles making the comment about /bin/sh on Solaris being junk and switching to /bin/ksh. I didn't try a build on Solaris and never let the build on Linux run to completion to see if this actually produced a valid build, but again, since this didn't fix the root issue, this issue is not related to the main question at hand, so I'll give up on that for now and maybe look at that later.

The main question that remains would be: Is there a way to debug and follow the token check-in/check-out process that is used internally in GNU make to try and see what's going on here? I can work on trying to track down what's going wrong, but without a way to get visibility into the process, I'd just be making random changes to the makefiles, which isn't going to be very productive.

Steve

On Oct 8, 2008, at 11:31 AM, Steve Waltner wrote:

I'm working on completing the migration of our build process for a
rather large software project from Solaris SPARC to Linux x86 and have
run into an issue. This process is using GNU Make 3.81 on a Solaris 9
box and a RedHat AS 4.7 x86_64 system. The major symptom that I've
noticed is that the Linux system doesn't really honor the "-j 4"
option we typically build with. It quickly degrades into a single
threaded build. Items of note include:

- The builds are being passed to a cluster of systems running Sun Grid
Engine, so the "-j 4" option isn't passed at the command line. The
first build command looks for a $NSLOTS environment variable and
changes the MAKEFLAGS as appropriate.

- I am running both builds with a copy of GNU Make that I compiled
from the same source code. I am not using the copy of make that was
included with the RedHat or Solaris systems.

- The makefiles include settings like "override SHELL = /bin/ksh" to
force all shell interpretations to go through the ksh.

It appears as though the Linux system feeds every command through a
ksh process, while the same function on the Solaris system calls the
command (wether it is a ccpentium, ccarm, or make command) directly.
This is done by looking at the process hierarchy using the pstree
command. The examples below were both done on a build with NSLOTS=4
(ie: -j 4). You can see the Solaris build running three ccpentium
processes at the time this snapshot was taken, while the Linux build
has only spawned a single ccpentium command.

Solaris:
====================
ictgrid004:~> sgetree
-+- 00278 sgeadmin 9:36 /soft/gridware-wic/sge/6.0u6/bin/sol-sparc64/
sge_execd
 |-+- 18341 sgeadmin sge_shepherd-467543 -bg
 | \-+- 18342 root /soft/gridware-wic/sge/6.0u6/utilbin/sol-sparc64/
rshd -l
 |   \-+- 18343 swaltner /soft/gridware-wic/sge/6.0u6/utilbin/sol-
sparc64/qrsh_
 |     \-+- 18347 swaltner tcsh -c hostname ; gmake
 |       \-+- 18353 swaltner /soft/gnu/make/3.81/bin/gmake
 |         \-+- 26740 swaltner /soft/gnu/make/3.81/bin/gmake
Platform/.make App
 |           |-+- 26757 swaltner /soft/gnu/make/3.81/bin/gmake -C
Platform MKLe
 |           | \-+- 26819 swaltner /soft/gnu/make/3.81/bin/gmake
Boot/.make Sys
 |           |   \-+- 26906 swaltner /soft/gnu/make/3.81/bin/gmake -C
System MK
 |           |     \-+- 27024 swaltner /soft/gnu/make/3.81/bin/gmake
BSP/.make
 |           |       \-+- 10502 swaltner /soft/gnu/make/3.81/bin/
gmake -C DQ MK
 |           |         \-+- 10560 swaltner /soft/gnu/make/3.81/bin/
gmake DQ MKL
 |           |           \-+- 11323 swaltner ccpentium -c -o dq.o -
fmessage-len
 |           |             \--- 11331 swaltner /soft/windriver/gpp/
3.4/gnu/3.4.
 |           \-+- 26788 swaltner /soft/gnu/make/3.81/bin/gmake -C
Application M
 |             \-+- 26853 swaltner /soft/gnu/make/3.81/bin/gmake
RAID/.make Deb
 |               |-+- 06868 swaltner /soft/gnu/make/3.81/bin/gmake -C
Debug MKL
 |               | \-+- 06928 swaltner /soft/gnu/make/3.81/bin/gmake
ccvm_dbg/.
 |               |   \-+- 11524 swaltner /soft/gnu/make/3.81/bin/
gmake -C safe_
 |               |     \-+- 11585 swaltner /soft/gnu/make/3.81/bin/
gmake safe_d
 |               |       \-+- 11639 swaltner ccpentium -c -o
safeSymbolDebug.o
 |               |         \--- 11642 swaltner /soft/windriver/gpp/
3.4/gnu/3.4.
 |               |-+- 26909 swaltner /soft/gnu/make/3.81/bin/gmake -C
RAID MKLe
 |               | \-+- 27055 swaltner /soft/gnu/make/3.81/bin/gmake
cache/.mak
 |               |   \-+- 08612 swaltner /soft/gnu/make/3.81/bin/
gmake -C hid M
 |               |     \-+- 08728 swaltner /soft/gnu/make/3.81/bin/
gmake hid MK
 |               |       \-+- 11452 swaltner ccpentium -c -o
hidLUDispatch.o -f
 |               |         \--- 11457 swaltner /soft/windriver/gpp/
3.4/gnu/3.4.
 |               \--- 11635 swaltner /soft/gnu/make/3.81/bin/gmake -C
MAPI MKLe
====================

Linux:
====================
ictgrid005:~/ccm_wa/symbios/RAIDCore-swaltner_1636/
dev_09q4_fc_7091-68.10.00.03> ~/pstree-2.32/pstree 3543
-+= 03543 root /soft/gridware-wic/sge/6.0u6/bin/lx24-amd64/sge_execd
 \-+= 21589 root sge_shepherd-467474 -bg
   \-+= 21590 root /soft/gridware-wic/sge/6.0u6/utilbin/lx24-amd64/
rshd -l
     \-+= 21591 swaltner /soft/gridware-wic/sge/6.0u6/utilbin/lx24-
amd64/qrsh_starter /var/spool/sgeexecd/ictgrid005/active_jobs/467474.
       \-+= 21603 swaltner tcsh -c hostname ; gmake
         \-+- 21612 swaltner gmake
           \-+- 04707 swaltner /bin/ksh -c gmake Platform/.make
Application/.make  MKLevel=$(( 0 + 1 )) MKopts='';
             \-+- 04708 swaltner gmake Platform/.make
Application/.make MKLevel=1 MKopts=
               \-+- 04787 swaltner /bin/ksh -c gmake  -C
Application    MKLevel=$(( 1 + 1 ))
                 \-+- 04788 swaltner gmake -C Application MKLevel=2
                   \-+- 04868 swaltner /bin/ksh -c gmake RAID/.make
Debug/.make MAPI/.make TAPI/.make Spy/.make Stpsim/.make FBDT/.make
                     \-+- 04870 swaltner gmake RAID/.make Debug/.make
MAPI/.make TAPI/.make Spy/.make Stpsim/.make FBDT/.make IT/.make D
                       \-+- 04947 swaltner /bin/ksh -c gmake  -C
RAID    MKLevel=$(( 3 + 1 ))
                         \-+- 04948 swaltner gmake -C RAID MKLevel=4
                           \-+- 05074 swaltner /bin/ksh -c gmake
cache/.make iop/.make htd/.make hid/.make icn/.make rtr/.make rpa/.make
                             \-+- 05075 swaltner gmake cache/.make
iop/.make htd/.make hid/.make icn/.make rtr/.make rpa/.make Fibre/.ma
                               \-+- 11193 swaltner /bin/ksh -c gmake
-C vdm    MKLevel=$(( 5 + 1 ))
                                 \-+- 11194 swaltner gmake -C vdm
MKLevel=6
                                   \-+- 18797 swaltner /bin/ksh -c
gmake vdm  MKLevel=$(( 6 + 1 )) MKopts='';
                                     \-+- 18798 swaltner gmake vdm
MKLevel=7 MKopts=
                                       \-+- 22893 swaltner /bin/ksh -
c HOME="" LM_LICENSE_FILE="" ccpentium -c -o vdmRVState.o -fmessage
                                         \-+- 22894 swaltner
ccpentium -c -o vdmRVState.o -fmessage-length=0 -O2 -nostdlib -fno-
builtin
                                           |--- 22896 swaltner /soft/
windriver/gpp/3.4/gnu/3.4.4-vxworks-6.4/x86-linux2/bin/../libexec/g
                                           \--- 22895 root
(get_feature)
ictgrid005:~/ccm_wa/symbios/RAIDCore-swaltner_1636/
dev_09q4_fc_7091-68.10.00.03>
====================

I believe this behavior is causing the make process to consume tokens
for the parallel builds when it shouldn't be. The ksh process that
launches the gmake command in the subdirectory is consuming the token.
Once you get deep enough in the source directory, all the tokens are
in use by these idle ksh processes causing it to fall-back to a single
thread on the build. This is confirmed by starting a build using a "-j
8" or "-j 16" or higher. By giving the make process more tokens, it is
able to keep the CPU busy on this quad CPU Linux server. This worked
fine when there is a single developer on the build system, but that
won't work well for the way we launch builds on these systems through
SGE. Once this issue is resolved, we can deploy the x86 hardware which
will give us the same build speeds in a box that is 20% the physical
size and costs about 10% of the price of the SPARC systems we have
been using.

Thanks for any guidance you can provide. I've been fooling with this
for several days without any luck.

Steve


_______________________________________________
Help-make mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/help-make





reply via email to

[Prev in Thread] Current Thread [Next in Thread]