Forum Post: Error (10716) after unexpected server shutdown - RPLS: sequence number [NUM] is...

Status
Not open for further replies.
D

Dmitry Lishafaev

Guest
I have two AIX servers - Server-Tgt and Server-Src. I have 600Gb archive database with no transaction activity almost at all time (except data archiving - one day per month). This database replicated from Server-Src to Server-Tgt, async mode. Today Server-Src has exceeded virtual memory size, hung and was rebooted manually. db log: [2015/03/25@09:24:23.832+0300] P-9371908 T-1 I AIMGT 5: (3778) This is after-image file number 850 since the last AIMAGE BEGIN [2015/03/25@09:24:24.002+0300] P-5833054 T-1 I RPLS 14: (11805) Unlocking after-image file 848 and locking ALL FULL after-image files beginning with file 849. [2015/03/25@09:24:28.850+0300] P-9371908 T-1 I AIMGT 5: (13199) After-image extent /renmsg/ai/renmsg.a4 has been copied to /backup/tape/ai/oss~dbase~renmsg.20150218.182948.00000849.renmsg.a4. [2015/03/25@09:24:28.851+0300] P-9371908 T-1 I AIMGT 5: (13154) Marked after-image extent /renmsg/ai/renmsg.a4 ARCHIVED. [2015/03/25@09:24:28.868+0300] P-9371908 T-1 I AIMGT 5: (3789) Marked after-image extent /renmsg/ai/renmsg.a3 EMPTY. [2015/03/25@10:24:28.910+0300] P-9371908 T-1 I AIMGT 5: (3777) Switched to ai extent /renmsg/ai/renmsg.a1. [2015/03/25@10:24:28.910+0300] P-9371908 T-1 I AIMGT 5: (3778) This is after-image file number 851 since the last AIMAGE BEGIN [2015/03/25@10:24:29.056+0300] P-5833054 T-1 I RPLS 14: (11805) Unlocking after-image file 849 and locking ALL FULL after-image files beginning with file 850. [2015/03/25@10:24:33.932+0300] P-9371908 T-1 I AIMGT 5: (13199) After-image extent /renmsg/ai/renmsg.a5 has been copied to /backup/tape/ai/oss~dbase~renmsg.20150218.182948.00000850.renmsg.a5. [2015/03/25@10:24:33.933+0300] P-9371908 T-1 I AIMGT 5: (13154) Marked after-image extent /renmsg/ai/renmsg.a5 ARCHIVED. [2015/03/25@10:24:33.953+0300] P-9371908 T-1 I AIMGT 5: (3789) Marked after-image extent /renmsg/ai/renmsg.a4 EMPTY. [2015/03/25@11:01:21.395+0300] P-7930094 T-1 I SRV 2: (739) Logout usernum 21, userid appsrv1, on Server-Src batch. Note - AI sequence 850 was locked before reboot (10:24) and archived at same time with AIMGT (AIMGT policy - one AI per hour) Server-Src was exceeded swap space at 11:18 and rebooted after 11:20. During reboot I got on Server-Tgt: [2015/03/25@11:36:56.621+0300] P-10682534 T-1 I RPLA 5: (11699) A TCP/IP failure has occurred. The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server. After DB start on Server-Src I got: [2015/03/25@11:51:04.393+0300] P-7405852 T-1 I BROKER 0: (13875) This database is enabled for OpenEdge Replication as a Source database. ... [2015/03/25@11:51:07.509+0300] P-6684840 T-1 I RPLS 5: (10507) The Fathom Replication Server has successfully connected to the Fathom Replication Agent agent1 on host Server-Tgt. [2015/03/25@11:51:07.509+0300] P-6684840 T-1 I RPLS 5: (11251) The Replication Server successfully connected to all of it's configured Agents. [2015/03/25@11:51:07.649+0300] P-6684840 T-1 I RPLS 5: (10716) Fathom Replication Agent agent1 cannot be configured because the required AI area (area-number 13, sequence number 850 is not available. [2015/03/25@11:51:07.649+0300] P-6684840 T-1 I RPLS 5: (11696) The Agent agent1 cannot be properly configured and is being terminated. [2015/03/25@11:51:07.650+0300] P-6684840 T-1 I RPLS 5: (10700) The Fathom Replication Agent agent1 is being terminated. [2015/03/25@11:51:07.650+0300] P-6684840 T-1 I RPLS 5: (10504) Unexpected error -158 returned to function rpSRV_ServerLoop. But current sequence number on Server-Tgt is 851 (I ran this command only one hour ago, not immediately) -bash-4.2# dsrutil msgtgt -C recovery agent Online Replication Recovery Information for /oss/dbase/msgtgt: Replication version: 5.0 Date created: Wed Feb 18 23:04:00 2015 Date last written: Wed Mar 25 21:44:18 2015 Replication local agent information: Last Block: Complete Last block received location: area: 7, seq: 0, loc: 0, offset: 0 Last block processed location: area: 0, seq: 0, loc: 0, offset: 0 Last block ACKed location: area: 7, seq: 851, loc: 0, offset: 128 Last block received: no date Last block ACKed: no date ID of the last TX begin: 139660314 ID of the last TX end: 139660314 Time of last TX end: Wed Mar 11 18:37:44 2015 Last AI Extent processed AIMAGE BEGIN date: Wed Feb 18 18:29:48 2015 AIMAGE NEW date: Wed Mar 25 10:24:28 2015 After-Image File Number: 851 File Last Opened: Wed Mar 25 10:24:28 2015 Completely Applied to Target: No I can do dsrutil applyextent only with sequence numbers 851 and above . I decided to re-enable replication (via backup/restore), but maybe there are another solutions? 10.2B08/AIX

Continue reading...
 
Status
Not open for further replies.
Top