gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Problem with TLA ver > 887


From: Mickey Mazarick
Subject: Re: [Gluster-devel] Problem with TLA ver > 887
Date: Sun, 08 Feb 2009 12:49:10 -0500
User-agent: Thunderbird 2.0.0.19 (Windows/20081209)

Heh our tests are kind of an unholy mess... but here's the part I think is useful:
We use a startup script that will iterate through vol files and mount the first available file on the list. We have a bunch of vol files that test a few different server configurations. After mountpoints are prepared we have other scripts that start virtual machine on the various mounts.

 In other words I have a directory called "/glustermounts/" and in that directory I have the files:
main.vol  main.vol.ib  main.vol.tcp stripe.vol.ha stripe.vol.tcp

after running "/etc/init.d/glustersystem start"  I will have the following mount points:
/system     (our default mount, we actually store the vol files here)
/mnt/main
/mnt/stripe

The output shows me if any vol file failed to mount and it automatically attempts the next one (ex" "mounting main.vol failed, trying main.vol.ib"). We simply arrange vol files from most features to least. We have a separate script which starts up a virtual machine on each test mount. This is the actual "test" we use as it creates symbolic links, uses mmaps etc but it's pretty specific to us. This closely mirrors how we use it in production.

I've included out startup script and I would suggest you simply run something similar to your production on a few mounts in the same way we have. I may share this with the entire group although there are probably better init scripts out there. This one does kill all processes attached to a mount point which is useful. Let me know if you have any questions!

Thanks!

-Mickey Mazarick



Geoff Kassel wrote:
Hi,
   As a fellow GlusterFS user, I was just wondering if you could point me to 
the regression tests you're using for GlusterFS?

   I've looked high and low for the unit tests that the GlusterFS devs are 
meants to be using (ala http://www.gluster.org/docs/index.php/GlusterFS_QA) 
so that I can do my own testing, but I've not been able to find them.

   If it's tests you've developed in-house, would you be interested in 
releasing them to the wider community?

Kind regards,

Geoff Kassel.

On Thu, 5 Feb 2009, Mickey Mazarick wrote:
  
I haven't done any full regression testing to see where the problem is
but the later TLA versions are causeing out storage servers to spike to
100% cpu usage and the clients never see any files. Our initial tests
are with ibverbs/HA but no performance translators.

Thanks!
-Mickey Mazarick
    


--
#!/bin/sh
# Startup script for gluster Mount system
volFiles="/glustermounts/"
defaultcheckFile="customers"
speclist="/etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.ha 
/etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.tcp"
start() {
        specfile=${1}
        if [ "$#" -gt 1 ]; then
                mountpt=${2}
        else
                mountpt=`echo ${specfile} |sed "s#\.vol.*\\\$##" |sed 
"s#/.*/##"`
                mountpt="/mnt/${mountpt}"
        fi
        logfile=`echo ${specfile} |sed "s#\.vol.*\\\$##" |sed "s#/.*/##"`
        logfile="/var/${logfile}.log"
        pidfile=`echo ${specfile} |sed "s#\.vol.*\\\$##" |sed "s#/.*/##"`
        pidfile="/var/run/${pidfile}.pid"
        echo "mounting specfile:${specfile} at:${mountpt} with pid 
at:${pidfile}"
        currentpids=`pidof glusterfs`
        currentpids="0 ${currentpids}"
        mountct=`mount |grep ${mountpt} |grep -c glusterfs`
        if [ -f $pidfile ]; then
                currentpid=`cat ${pidfile}`
                pidct=`echo "${currentpids}" |grep -c ${currentpid}`
                if [ "${pidct}" -eq 0 ]; then
                        rm -rf ${pidfile}
                        echo "removing pid file: ${pidfile}"
                fi
                if [ "${mountct}" -lt 1 ]; then
                        echo "Gluster System mount:${mountpt} died. Remounting."
                        stop ${mountpt} ${pidfile}
                fi
        else
                rm -rf ${pidfile}
                if [ "${mountct}" -gt 0 ]; then
                        myupid=`ps -ef |grep /system |grep gluster |sed 
"s#root\s*##" |sed "s#\s.*##"`
                        if [ "${myupid}" -gt 0 ]; then
                           echo "${myupid}" > ${pidfile}
                        else
                           echo "Gluster System mounted at:${mountpt} but with 
no pid. Remounting."
                           stop ${mountpt} ${pidfile}
                        fi
                fi
        fi

        if [ -e $pidfile ]; then
                echo "Gluster System Mount:${mountpt} is running with spec: 
${specfile}"
                #echo "Gluster System Mount:${mountpt} is running."
                return 0
        else
        #rm -rf /var/glustersystemclient.log
        modprobe fuse
        sleep 1.5
        #rm -rf /var/glustersystemclient.log
        mkdir ${mountpt}
        rm -rf $pidfile
        cmd="/usr/local/sbin/glusterfs -p $pidfile -l ${logfile} -L ERROR -f 
${specfile} --disable-direct-io-mode ${mountpt}"
echo "${cmd}"
        ${cmd}
#/usr/local/sbin/glusterfs -p $pidfile -l ${logfile} 
--volume-specfile=${specfile} --disable-direct-io-mode ${mountpt}
        #/usr/local/sbin/glusterfs -p $pidfile -l /var/glustersystemclient.log 
-f $specfile --direct-io-mode=DISABLE /system
        fi
        return 1
}

checkStart() {
        mountdir=$1
        checkfile="total"
        if [ "$#" -gt 1 ]; then
           checkfile=$2
        fi
        lspid=0
        sleep 1
        counter=0
        countermax=15

        ls -l ${mountdir} &
        while [ "${lspid}" != "" ]
        do
          echo "waiting for gluster to come up... ${counter}"
          sleep 1
          lspid=`/sbin/pidof ls`
          let counter++
          if [ "${counter}" -eq "${countermax}" ]
          then
           lspid=""
          fi
        done
        if [ "${counter}" -lt "${countermax}" ]; then
          errorct=`ls ${mountdir} 2>&1 |grep -c "not connected"`
          if [ "${errorct}" -eq 1  ]; then
                counter=`echo ${countermax}`
          else
            glcount=`ls -l ${mountdir} |grep ${checkfile} -c`
            if [ "${glcount}" -lt 1 ]; then
                counter=`echo ${countermax}`
            fi
          fi
        fi

        if [ "${counter}" -eq "${countermax}" ]
        then
          echo "gluster FAILED to mount:${mountdir} with spec: ${specfile}"
          lspid=`/sbin/pidof ls`
          kill $lspid
          lspid=10
          return 0
        else
          echo "Gluster System Mount:${mountdir} is running with spec: 
${specfile}"
          #echo "gluster sucessfully mounted:${mountdir} with spec: ${specfile}"
          return 1
        fi
}

StartSpeclist() {
        specfilelist="${1}"
        echo "Attempting to mount first of: (${specfilelist})"
        for file in $specfilelist
        do
                specfile="${file}"
                if [ "$#" -gt 1 ]; then
                        checkfile=${2}
                else
                        checkfile="total"
                fi
                if [ "$#" -gt 2 ]; then
                        mountpt=${3}
                else
                        mountpt=`echo ${specfile} |sed "s#\.vol.*\\\$##" |sed 
"s#/.*/##"`
                        mountpt="/mnt/${mountpt}"
                fi

                start ${specfile} ${mountpt}
                if [ "$?" -eq "1" ]; then
                  checkStart ${mountpt} ${checkfile}
                  if [ "$?" -eq "0" ]; then
                        stop ${specfile} ${mountpt}
                  else
                        return 1
                  fi
                else
                        return 1
                fi
          done
        return 0
}

stop() {
        specfile1=${1}
        if [ "$#" -gt 1 ]; then
                mountpt=${2}
        else
                mountpt=`echo ${specfile1} |sed "s#\.vol.*\\\$##" |sed 
"s#/.*/##"`
                mountpt="/mnt/${mountpt}"
        fi
        pidfile=`echo ${specfile1} |sed "s#\.vol.*\\\$##" |sed "s#/.*/##"`
        pidfile="/var/run/${pidfile}.pid"
        #runningpids=`lsof |grep ${mountpt} |sed "s#..........##" |sed "s# 
.*##"`
#       for pid in `lsof |grep ${mountpt} |sed "s#\w*\s*##" |sed "s# .*##"`
        for pid in `lsof |grep ${mountpt} |sed "s#..........\(......\).*#\1#"`
        do
                kill -9 $pid
        done
        #fuser -km /system
        echo "Stopping mount:${mountpt} spec:${specfile1}"
        umount -f ${mountpt}
        currentpid=`cat ${pidfile}`
        kill $currentpid
        rm -rf $pidfile
}

stopmp(){
        mountpt=${1}
        spec=`ps -ef |grep gluster |grep ${mountpt} |grep specfile |sed 
"s#.*specfile=\(.*\)/s*.*#\1#"| sed "s# .*##"`
        stop "${spec}" "${mountpt}"
}

startAll() {
          StartSpeclist "${speclist}" "glustermounts" /system
          if [ "$?" -eq "0" ]; then
                echo "ERROR STARTING"
          else
                #for i in `ls -b ${volFiles}*.vol |sed s/.glustermounts.//`;
                for i in `ls -b ${volFiles}*.vol |sed s#${volFiles}##`;
                do
                list=`ls -C ${volFiles}${i}*`
                mountpt=`echo $i |sed s/\.vol//`
                StartSpeclist "${list}" "${defaultcheckFile}" /mnt/${mountpt}
                done

          fi

}

stopAll() {
        mountlist=`mount |grep glusterfs |sed "s#glusterfs on \(.*\) 
type.*#\1#"`
        for mountpt in ${mountlist};
        do
          stopmp ${mountpt}
        done
        kill `pidof glusterfs`
}

case "$1" in
        start)
         if [ "$#" -gt 1 ]; then

            StartSpeclist "${2}" ${3} ${4}

         else
           startAll
         fi
            ;;

        stop)
         if [ "$#" -gt 1 ]; then
           stopmp ${2}
         else
           stopAll
           stopAll
         fi

            ;;

        status)
            status
            ;;
        restart)
            stop
            start
            ;;
        condrestart)
                stop
                start
            ;;

        *)
            echo $"Usage: $0 {start|stop|restart|condrestart|status}"
            exit 1

esac

exit $RETVAL

reply via email to

[Prev in Thread] Current Thread [Next in Thread]