WMD Zone: February 2011

Quite a few times on our clusters we've needed to make a cron job, or a shell script HA compatible. We'd like the cluster to be able to start and stop it, so it can failover with other resources if required.

It's actually a lot easier than it seems. The easiest way is making a while true loop with a sleep in the middle, then in each iteration check the current time against the run time of the script. It's kind of replacing cron, but needs must.

This is how I did it.

Stage 1 - Make a standard LSB compatible init script carcass

#!/bin/sh
# description: Start or stop your res name
#
### BEGIN INIT INFO
# Provides: your_res_name
# Required-Start: $network $syslog
# Required-Stop: $network
# Default-Start: 3
# Default-Stop: 0
# Description: Start or stop your res name
### END INIT INFO

RUNFILE="/var/run/your_res_name"

NAME="YourResName"
case "$1" in
'start')
   CHECKSTATUS
   [ "$RUNNING" ] && echo "$0 is already running" && exit 0
   echo $"Starting $0"
   touch $RUNFILE
   MAINLOOP &
   ;;
'stop')
   [ -f "$RUNFILE" ] && rm $RUNFILE
   pkill -f "$NAME "
   echo "$NAME"
   ;;
'restart')
   $0 stop
   sleep 5
   $0 start
   ;;
'status')
   CHECKSTATUS
   [ "$RUNNING" ] && echo "$NAME is running" && exit 0 || echo "$NAME is stopped" && exit 3;;
*)
   echo
   echo $"Usage: $0 {start|stop}"
   echo
   exit 1;;
esac

There's no CHECKSTATUS or MAINLOOP functions yet, we need to add those next.

MAINLOOP

You need a while true ; do ; done loop to sit there running through the stuff you want to check and then to do stuff at the appropriate times.

RUNTIME="000300" # 00:30:00
MAINLOOP() {

LOG="/var/log/$NAME.log"

while true
do
# Check for permission to run.
[ ! -f "$RUNFILE" ] && exit 0

# Check if we've already run today
if [ ! -f "$OUTPUTFILE" ]
then
# Or if we're still running
NUMPROCS=`pgrep -f "$NAME " | wc -l`
if [ $NUMPROCS -lt 1 ]
then
THETIME=`date +%H%M%S` # Get a numerically comparable time.
if [ $THETIME -gt $RUNTIME ]
then
echo -e "\nApparently $THETIME is greater than $RUNTIME so it's time to do our thang" >> $LOG
echo -e "------------------------" >> $LOG
echo -e "\n*** Starting process.***\nThe time : $THETIME" >> $LOG
echo -e "\nNumber of existing processes : $NUMPROCS" >> $LOG
echo -e "\nLet's GO!\n" >> $LOG
RUN_OUTPUT
fi
fi
fi
sleep 1m
done
}

CHECKSTATUS

You can just use something simple like check for the run file and the background process running on this

CHECKSTATUS () {

if [ -f "$RUNFILE" ]
then
if [ `pgrep -f "mi_data_extract start" | wc -l` -gt 0 ]
then
RUNNING="yes"
else
unset $RUNNING
fi
fi

}

That's pretty much it. Check each operation of the init script with echoing out the return code from every state. e.g.

/etc/init.d/your_res start ; echo $? # from stopped, should be 0
/etc/init.d/your_res start ; echo $? # from started, should be 0
/etc/init.d/your_res stop ; echo $? # from started, should be 0
/etc/init.d/your_res stop ; echo $? # from stopped, should be 0
/etc/init.d/your_res status ; echo $? # from started, should be 0
/etc/init.d/your_res status ; echo $? # from stopped, should be 3

Once this is done, you can just add it to the cluster as a primitive LSB resource, add a monitor on it and let the cluster take care of your script.

WMD Zone

Wednesday, 9 February 2011

Making BASH Scripts HA Compatible - Daemonising BASH