Trapping for kill command on linux

LarryD

Active Member
Not sure if this should go here or somewhere else...

Quick background and description of the issue:

OE10.0B (not Webspeed)

One of our customers has a background Progress process (run as root) on a linux web server which does a remote connection to a Progress DB on another server. The web client uploads a purchase order file through a separate process, the web db gets toggled with the file name, etc, then the background process looks for these files to be processed. Works well and has been in place for several years.

3 weeks ago the web server went fubar (Friday afternoon at 4:30). Fortunately, we were in the process of testing a new server to replace the one that went bad... but we weren't complete yet. Over the weekend, we rushed to complete the install, ran some tests, fixed stuff, etc. By Monday morning it was back up and running.

But we are being plagued by intermittent killing of the background process. The only thing in the background process log is:
"KILL signal received. (298)"

We have a cron job checking to insure that the process is running and does an email/restart when it's not, but we're trying to figure out what is killing the Progress background job.

I've tried adding the "trap" linux command to the script that starts up the background job to get a list of processes running on the linux web server, but it doesn't do anything. There is nothing in the Progress db log either. I've searched the Progress kbase but nothing there either that I could find. I'm suspecting that Progress is doing the signal handling and overrides the 'trap' command, but I don't know that for sure.

Does anyone have any ideas/suggestions on how to either trap for the kill (so I can see what else is running at the time) or if there is some other method to determine what might be killing it?

I'll provide more details if that would be of any help.
 
I do not approve of it for most purposes but if you are careful and just do this for a special, limited time, purpose then it might be ok to use the "trap" command.

The message in the .lg file can be somewhat deceptive -- the "kill signal" being received refers to SIGTERM (-15) not SIGKILL (-9). SIGKILL is not trappable and thus a message about it could never actually be written to the .lg file by the client that was killed ;)

Also -- I would be on the lookout for "orphan" or "idle process" killers. Sometimes these are being run from cron without your knowledge.
 
Thanks, Tom.

I was already aware that kill -9 (aka SIGKILL) is not trappable, and that the default kill is -15 (SIGTERM), but that might be useful for someone else.

The reason for trying to use the "trap" command was to get a list of processes running to see if there were one of those orphan/idle process killers lurking about, since the only people with root access (that I am aware of) on this box are 4 individuals, none of whom would be killing processes unless for a reason (and I don't suspect any of the others to have nefarious purposes).

Just an FYI, the trap command didn't work... with the trap command in the script I can kill and all I see is the entry in the log file.

Here is the script (edited for non-essential stuff):

Code:
trap "echo \'$0 process $$ killed on $(date).\' >> /logdir/logs/kill.log; echo 'Active processes at time of kill' >> /logdir/logs/kill.log; ps afx >> /logdir/kill.log;  exit " HUP INT QUIT ABRT TERM STOP FPE
#
exec  $DLC/bin/_progres -pf /pfdir/my.pf -bp web/myprog.p  
>  >> /logdir/logs/weberrors.log 2>&1
 
Yes, I was commenting about KILL for the record ;)

There is an apparent error in your script -- you are using two different names for the logfile (way out at the end of the command...) But I doubt that it matters.

I don't know why trap isn't working -- I just tried it myself and it doesn't work for me either so you aren't alone :(

Another thing that you might try (temporarily) is to remove the exec and add your logging after the launch (and eventual exit) of _progres. Maybe log $? as well as the ps data.
 
You should check linux logs for OOM killer ("Out of Memory: Killed process"). Probably some process uses too much memory and then OOM killer kills random processes trying to recover memory from killed processes.
 
Back
Top