Progress working in a Cluster

rrojo7229 · Jun 7, 2007

Dear Friends,

We have 4 Databases and 2 Appservers to startup in a Cluster, the Cluster configuration is:
- 2 Servers;
- 1 Storage.
- OpenEdge 101B
- Linux RedHat Enterprise 3

In the situation when 1 node (server) lost the Network Connection, the Cluster Administrator has the role of pass the service from 1 node to another node. Ok. Very good.

But the problem is when the Cluster pass the service, it "reboot" that server that had the Database Service before it pass the service to another server. This way the Databases and AppServers does not make a "properly" shutdown.

I talked to RedHat Support and they said that this procedure is a normal procedure of Cluster Administration. The cluster before passes the service from one node to another, it will reboot to try reconnect that server in the Network again and it reboot as soon as possible to keep the integrity data.

Consequently, when the Cluster try execute the "Start" function to startup the databases in the another server, it did not reach because there is "lk" files.

Have you passed for this situation?

Thanks.
Kind Regards.
Ricardo Olguin.

KlausErichsen · Jun 7, 2007

We use a function in a script. See below.

Code:

function f_RemoveLkFile {
  # $1 = Directory
  # $2 = lk file name

  fuser $1/* 2>/dev/null | read FUse_test  2>&1      | tee -a  $LogFile
  if [[ -n $FUse_test ]]
  then
    echo ""                                          | tee -a  $LogFile
    echo "Error: one or more files in $1 are in use" | tee -a  $LogFile
    return 1
  fi

  # Remove lk file (force)
  if [[ -f $1/$2 ]]
  then
    echo ""                                             | tee -a  $LogFile
    echo "Warning: Delete file $1/$2 for db restarting" | tee -a  $LogFile
    rm -f $1/$2   2>&1                                  | tee -a  $LogFile
  fi

  return

} # end of f_RemoveLkFile

rrojo7229 · Jun 7, 2007

Dear Klaus,

Ok. Thanks for your help.

But, this is my first time that I handle Progress Database in a cluster. Is it normal that the cluster reboot the node without a properly shutdown?

Or your cluster reach to make a shutdown before the node has been rebooted? If yes, what did you do in the cluster to make this?

Because our cluster in 12 or 15 seconds reboot the node and we do not have time to make shutdown for 4 Databases and 2 Appservers...

Thanks.
Kind Regards.
Ricardo Olguin.

KlausErichsen · Jun 8, 2007

Hello Ricardo,
you should have two different szenarios in your cluster.

1) Normal takeover
This is started by a user or whatever, but is a planned takeover. In this case the node 1 will get enough time to shut down everything. After shutdown node 2 will come up, get shared resources like disks and start anything. This will take, depending on db size, some minutes.

You can start node 2 without waiting for node 1 to come down. Then the db will go through crash recovery. But this is, in my opinion, not a useful way. There must be an option that the cluster software is waiting for some shutdown scripts. Maybe there is a timeout which is very short?

2) Emergency takeover
If node one is running and the cluster software on node 2 is detecting that node 1 is dead - and there should two independent ways for controlling this (two NIC, 1 NIC and serial, 1 NIC and SAN interface, or whatever), then node 2 detects an emergency.
Node 2 will allocate the resources (like disks) and start the applications. In this case the dbs will go though crash recovery.
The applications should be clever enough to do a warm restart, lets say for import/export jobs.

My experience is based on HACMP for AIX and Heartbeat for Linux together with DRDB.

Regards
Klaus

rrojo7229 · Jun 13, 2007

Hi Klaus,

Ok. Thanks for you quickly answer.
Yes, we have the 1) option.

1) Normal takeover
This is started by a user or whatever, but is a planned takeover. In this case the node 1 will get enough time to shut down everything. After shutdown node 2 will come up, get shared resources like disks and start anything. This will take, depending on db size, some minutes.

But the question is:
In the (Stop Function) of the Cluster Configuration (RedHat 3.4.6) where we make shutdown of 4 Databases and 2 Appservers we needed increment the parameter /proc/cluster/config/cman/deadnode_timeout from 21 seconds to 51 seconds to reach a properly shutdown, after that, the "Stop function" does have time to execute when we, for example - as a test, plug out the network cables of the node that it is owner by the service to see the cluster passing the service to another node.

Did you need make the same? I mean, did you need make changes in values to delay the die of that node that it has the service to the cluster make a properly shutdown before die?

Thanks.
Kind Regards.
Ricardo Olguin

KlausErichsen · Jun 14, 2007

Hello Ricardo,
in our case we had HACMP from AIX. I believe there it is normal, that the scripts wait until the shutdown is ready.
Therefore I can not tell you about Red Hat Cluster solution.

But I wouldn't worry, when I need to increase some timeouts if it's looking appropiate to me.

Klaus

Progress working in a Cluster

rrojo7229

Member

KlausErichsen

New Member

rrojo7229

Member

KlausErichsen

New Member

rrojo7229

Member

KlausErichsen

New Member