Resolved "transition Failover" Fails With "segmentation Fault"

Vito

New Member
Dear friends,
I would really appreciate you help a lot!:)

I am trying to implement Failover transition (planned switch of a target and source) described here OpenEdge 11.7 Documentation and getting "Segmentation fault" on the step "Synchronization in process".:(

All the steps to be performed are here OpenEdge 11.7 Documentation

Environment is Linux SUSE 11SP4, also tried it on CENTOS 7 and Redhat 7.
Progress used 11.5.1, tried same thing with 11.7. Result is the same:mad:

Goal is to be able to failover from source to target and fail back without copying the full DB backup over the internet (we don't have wide pipe between PROD and DR sites) and without loosing the replication.
Exactly as they wrote: planned switch of a target and source.


So in DBNAME.repl.properties I used suggested databass-role=REVERSE

To easily reproduce this Error/Bug I used standard Progress scripts creating two copies of Sports database and establish replication between them on the same server.

Here are the files.
Working folder used here is /dbdata/progress/

addai.st
Code:
#addai.st
a  .  f 2048
a  .  f 2048
a  .

sp_t.st
Code:
#sp_t.st
b /dbdata/progress/sp_t.b1
#
d "Schema Area":6,32;1 /dbdata/progress/sp_t.d1
#
d "Info Area":7,32;1 /dbdata/progress/sp_t_7.d1
#
d "Customer/Order Area":8,32;8 /dbdata/progress/sp_t_8.d1
#
d "Primary Index Area":9,1;8 /dbdata/progress/sp_t_9.d1
#
d "Customer Index Area":10,1;64 /dbdata/progress/sp_t_10.d1
#
d "Order Index Area":11,32;64 /dbdata/progress/sp_t_11.d1
#
a /dbdata/progress/sp_t.a1 f 2048
#
a /dbdata/progress/sp_t.a2 f 2048
#
a /dbdata/progress/sp_t.a3

sp_s.repl.properties
Code:
[server]
control-agents=agent1
database=sp_s
transition=manual
transition-timeout=1200
repl-Keep-Alive=120
defer-agent-startup=1440
agent-shutdown-action=recovery

[agent]
name=agent2
database=sp_s
listener-minport=2756
listener-maxport=2760

[control-agent.agent1]
name=agent1
database=sp_t
host=localhost
port=2755
connect-timeout=120
replication-method=async
critical=0

[transition]
replication-set=1
databass-role=reverse
restart-after-transition=1
auto-begin-ai=1
transition-to-agents=agent1
source-startup-arguments=-S 4501 -H localhost -DBService replserv
target-startup-arguments=-S 2755 -H localhost -DBService replagent
normal-startup-arguments=-S 4501 -H localhost

sp_t.repl.properties
Code:
[server]
    agent-shutdown-action=recovery
    control-agents=agent2
    database=sp_s
    defer-agent-startup=600
    repl-keep-alive=120
    transition=manual
    transition-timeout=60

[agent]
    name=agent1
    database=sp_t
    listener-minport=2756
    listener-maxport=2760

[control-agent.agent2]
    connect-timeout=60
    critical=0
    database=sp_s
    host=localhost
    name=agent2
    port=2755
    replication-method=async

[transition]
    replication-set=1
    auto-begin-ai=1
    database-role=reverse
    restart-after-transition=1
    transition-to-agents=agent2
    source-startup-arguments=-S 4501 -H localhost -DBService replserv
    target-startup-arguments=-S 2755 -H localhost -DBService replagent
    normal-startup-arguments=-S 4501 -H localhost

create.sh - this script creates and starts everything. Any time you want to redo the test, just run create.sh again, it will first clean and drop everything except config files.
Code:
proshut sp_s -by
proshut sp_t -by
echo y | prodel sp_s
echo y | prodel sp_t
rm *.lg
rm *.recovery
rm sp_res*
rm *.bak


echo y | prodel sp_s
rm sp_s.repl.recovery
rm sp_s.st

echo y | prodb sp_s sports
prostrct add sp_s addai.st
rfutil sp_s -C mark backedup -G 0
rfutil sp_s -C aimage begin -G 0
proutil sp_s -C enablesitereplication source
probkup sp_s sp_res -REPLTargetCreation


echo y | prodel sp_t
rm sp_t.repl.recovery
prostrct create sp_t
prorest sp_t sp_res
proutil sp_t -C aimage begin
proutil sp_t -C enablesitereplication target

proserve sp_t -S 2755 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -DBService replagent
probiw sp_t;proaiw sp_t;prowdog sp_t;proapw sp_t

proserve sp_s -S 4501 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -DBService replserv
probiw sp_s;proaiw sp_s;prowdog sp_s;proapw sp_s
sleep 10
echo a | rprepl sp_t -C monitor 2>/dev/null

Now after you executed create.sh and everything is created, started and you see that replication is up and running,
please try to fail-over from sp_s (source) to sp_t (target) by command

dsrutil sp_s -C transition failover

Unfortunately regardless of Linux (Suse, CentOS or RedHat) and version of Progress (initially tried with 11.5.1, after that with 11.7) I receive the same output
Code:
proenv>dsrutil sp_s -C transition failover   
Transitioning database /dbdata/progress/sp_s
---------------------------------------------------------------
19:44:41 Opening database                         : Succeeded
19:44:41 Setting up transition                    : Succeeded
19:44:41 Shutting down database                   : Succeeded
19:44:57 Starting database in Cur Role            : Succeeded
19:44:58 Synchronization in process               : Segmentation fault
Looking forward for your thoughts and input...
 

TomBascom

Curmudgeon
OE Replication is prone to mysterious failures -- it is a lot like a mobile phone that only gets reception in a certain room when held at just the right angle and with your fingers positioned just so...

I don't have OE Replication actually installed on my test system at the moment but this is what I have tucked away from a config that does what I think you are trying to accomplish:

[transition]
database-role=reverse
responsibility=primary
transition-to-agents=agent2
restart-after-transition=1
auto-begin-ai=1
backup-method=mark
source-startup-arguments=-DBService replserv -S 9500 -aiarcdir e:\aiarc
target-startup-arguments=-DBService replagent -S 9500 -aiarcdir e:\aiarc

Also - IMHO you should have at *least* 4 ai extents (preferably 8) and they should all be variable length. 2 fixed plus a variable is not a very useful setup. It would be far too easy to run out of available extents.
 

Vito

New Member
Hi Cringer,
here it is (I've ran it with "-logging 2" option for more details)
Code:
[2017/04/24@09:50:55.768-0400] P-27425      T-140357276374848 I BROKER  0: (18156) SQL Autonomous Schema Update (-SQLWidthUpdate): OFF
[2017/04/24@09:50:55.768-0400] P-27425      T-140357276374848 I BROKER  0: (14019) Record block consistency check (-TableCheck): Not Enabled
[2017/04/24@09:50:55.768-0400] P-27425      T-140357276374848 I BROKER  0: (17717) TXE Lock retry limit (-TXERetryLimit): 0
[2017/04/24@09:50:55.768-0400] P-27425      T-140357276374848 I BROKER  0: (13896) TXE Commit lock skip limit (-TXESkipLimit): 10000
[2017/04/24@09:50:55.768-0400] P-27425      T-140357276374848 I BROKER  0: (10836) Database connections are not allowed at this time.
[2017/04/24@09:50:55.768-0400] P-27425      T-140357276374848 I BROKER  0: (10471) Database connections have been enabled.
[2017/04/24@09:50:55.768-0400] P-27420      T-140095531206464 I RPLU    6: (452)   Login by root on /dev/pts/0.
[2017/04/24@09:50:55.768-0400] P-27420      T-140095531206464 I RPLU    6: (7129)  Usr 6 set name to root.
[2017/04/24@09:50:55.769-0400] P-27420      T-140095531206464 I RPLU    6: (13958) Beginning Replication Transition operation: Synchronization in process.
[2017/04/24@09:50:55.779-0400] P-27429      T-139850624042816 I AIMGT   7: (-----) Login by root.
[2017/04/24@09:50:55.780-0400] P-27429      T-139850624042816 I AIMGT   7: (13194) The after-image manager is beginning.
[2017/04/24@09:50:55.780-0400] P-27429      T-139850624042816 I AIMGT   7: (2518)  Started.
[2017/04/24@09:50:58.586-0400] P-27425      T-140357276374848 I BROKER  0: (2527)  Disconnecting dead user 6.
[2017/04/24@09:50:58.772-0400] P-27427      T-139827148998464 I RPLS    5: (10507) The Fathom Replication Server has successfully connected to the Fathom Replication Agent agent1 on host 127.0.0.1.
[2017/04/24@09:50:58.772-0400] P-27427      T-139827148998464 I RPLS    5: (11251) The Replication Server successfully connected to all of its configured Agents.
[2017/04/24@09:50:58.773-0400] P-27427      T-139827148998464 I RPLS    5: (10508) Beginning Fathom Replication synchronization for the Fathom Replication Agent agent1.
[2017/04/24@09:50:58.784-0400] P-27427      T-139827148998464 I RPLS    5: (10436) The source database sp_s and the target database /dbdata/progress/sp_t on host localhost are synchronized.

I've also ran it under debugger and here is an output
Code:
proenv>gdb -ex=r --args rprepl sp_s -C transition failover -Passphrase -logging 2       
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/dlc/bin/rprepl...(no debugging symbols found)...done.
Starting program: /usr/dlc/bin/rprepl sp_s -C transition failover -Passphrase -logging 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Transitioning database /dbdata/progress/sp_s
---------------------------------------------------------------
09:57:31 Opening database                         : Succeeded
09:57:31 Setting up transition                    : Succeeded
09:57:31 Shutting down database                   : Detaching after fork from child process 27469.
Succeeded
09:57:47 Starting database in Cur Role            : Detaching after fork from child process 27470.
Succeeded
09:57:48 Synchronization in process               :
Program received signal SIGSEGV, Segmentation fault.
0x00000000004ba09d in rpTRN_PerformTransition ()
 

Vito

New Member
Good morning Tom,
As far as I know Progress doesn't allow creation of more than one variable size extents, does it?
 

Vito

New Member
I've added few more BI and AI extents. I also enabled AI-archiving (for AI archives I created a folder /dbdata/progress/ai)

Now files look like this:
addai.st
Code:
a  . f 2048
a  . f 2048
a  . f 2048
a  . f 2048
a  . f 2048
a  . f 2048
a  . f 2048
a  .
#
b  . f 2048
b  . f 2048
b  .

sp_t.st
Code:
#
b /dbdata/progress/sp_t.b1 f 2048
b /dbdata/progress/sp_t.b2 f 2048
b /dbdata/progress/sp_t.b3 f 2048
b /dbdata/progress/sp_t.b4
#
a /dbdata/progress/sp_t.a1 f 2048
a /dbdata/progress/sp_t.a2 f 2048
a /dbdata/progress/sp_t.a3 f 2048
a /dbdata/progress/sp_t.a4 f 2048
a /dbdata/progress/sp_t.a5 f 2048
a /dbdata/progress/sp_t.a6 f 2048
a /dbdata/progress/sp_t.a7 f 2048
a /dbdata/progress/sp_t.a8
#
d "Schema Area":6,32;1 /dbdata/progress/sp_t.d1
#
d "Info Area":7,32;1 /dbdata/progress/sp_t_7.d1
#
d "Customer/Order Area":8,32;8 /dbdata/progress/sp_t_8.d1
#
d "Primary Index Area":9,1;8 /dbdata/progress/sp_t_9.d1
#
d "Customer Index Area":10,1;64 /dbdata/progress/sp_t_10.d1
#
d "Order Index Area":11,32;64 /dbdata/progress/sp_t_11.d1
#

sp_s.repl.properties (only last TRANSITION portion was changed)
Code:
[transition]
   responsibility=primary
   replication-set=0
   databass-role=reverse
   restart-after-transition=1
   auto-begin-ai=1
   transition-to-agents=agent1
   source-startup-arguments=-S 4501 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -bibufs 4 -aibufs 8 -DBService replserv
   target-startup-arguments=-S 2755 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -bibufs 4 -aibufs 8 -DBService replagent
   normal-startup-arguments=-S 4501 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -bibufs 4 -aibufs 8

sp_t.repl.properties (only last TRANSITION portion was changed)
Code:
[transition]
    responsibility=secondary
    replication-set=0
    database-role=reverse
    transition-to-agents=agent1
    auto-begin-ai=1
    incremental-backup-arguments=sp_res.inc
    recovery-backup-arguments=!secondary.recovery.bak
    restart-after-transition=1
    source-startup-arguments=-S 4501 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -DBService replserv
    target-startup-arguments=-S 2755 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -DBService replagent
    normal-startup-arguments=-S 4501 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120

and finally create.sh
Code:
proshut sp_s -by
proshut sp_t -by
echo y | prodel sp_s
echo y | prodel sp_t
rm *.lg
rm *.recovery
rm sp_res*
rm *.bak
rm ai/*


echo y | prodel sp_s
rm sp_s.repl.recovery
rm sp_s.st

echo y | prodb sp_s sports

proutil sp_s -C truncate bi
prostrct remove sp_s bi
prostrct add sp_s addai.st
rfutil sp_s -C mark backedup

rfutil sp_s -C aimage begin
proutil sp_s -C aiarchiver enable
proutil sp_s -C enablesitereplication source
probkup sp_s sp_res -REPLTargetCreation


echo y | prodel sp_t
rm sp_t.repl.recovery
prostrct create sp_t
prorest sp_t sp_res
proutil sp_t -C aimage begin
proutil sp_t -C aiarchiver enable
proutil sp_t -C enablesitereplication target

proserve sp_t -S 2755 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -bibufs 4 -aibufs 8 -DBService replagent
probiw sp_t;proaiw sp_t;prowdog sp_t;proapw sp_t

proserve sp_s -S 4501 -H localhost -aiarcdir /dbdata/progress/ai -aiarcinterval 120 -bibufs 4 -aibufs 8 -DBService replserv
probiw sp_s;proaiw sp_s;prowdog sp_s;proapw sp_s
sleep 5
echo a | rprepl sp_t -C monitor 2>/dev/null

Unfortunately no change, same Segmentation fault
 

TomBascom

Curmudgeon
I think you ought to reach out to Progress tech support. You seem to have an easy to reproduce case -- they love those.
 

Vito

New Member
Hi guys,
I finally found a perfectly working configuration, exactly what's needed.
Here it is:
addai.st
Code:
a  .
a  .
a  .

clean.sh
Code:
proshut db1 -by
proshut db2 -by
echo y | prodel db1
echo y | prodel db2

rm *.recovery;rm *.lg;rm *.log
rm *.bak;rm *.sav;rm *.sav.inc
rm *.stdout;rm db?.st

db1.repl.properties
Code:
#db1.repl.properties
[server]
   database=db1
   control-agents=agent2
   transition=manual
   agent-shutdown-action=recovery

[agent]
   name=agent1
   database=db1
   listener-minport=4505
   listener-maxport=4510

[control-agent.agent2]
   name=agent2
   database=db2
   host=localhost
   port=4502
   connect-timeout=60
   replication-method=async
   critical=0

[transition]
   replication-set=1
   database-role=reverse
   transition-to-agents=agent2

   auto-begin-ai=1
   backup-method=mark
   restart-after-transition=1

   source-startup-arguments=-S 4501 -DBService replserv
   target-startup-arguments=-S 4501 -DBService replagent
db2.repl.properties
Code:
#db2.repl.properties
[server]
   database=db2
   control-agents=agent1
   transition=manual
   agent-shutdown-action=recovery

[agent]
   name=agent2
   database=db2
   listener-minport=4511
   listener-maxport=4515

[control-agent.agent1]
   name=agent1
   database=db1
   host=localhost
   port=4501
   connect-timeout=60
   replication-method=async
   critical=0

[transition]
   replication-set=1
   database-role=reverse
   transition-to-agents=agent1
   restart-after-transition=1

   auto-begin-ai=1
   backup-method=mark

   source-startup-arguments=-S 4502 -DBService replserv
   target-startup-arguments=-S 4502 -DBService replagent

and finally create.sh, which creates both databases, fails over from DB1 to DB2, verifies the success, then without any extra backups and rebasing fails back from DB2 to Db1 and verifies it again.
Code:
./clean.sh

#create DB1 - source
prodb db1 sports
prostrct add db1 addai.st
rfutil db1 -C mark
rfutil db1 -C aimage begin
proutil db1 -C enablesitereplication source
probkup db1 db1.bak


#restore DB2 - a target
prorest db2 db1.bak
rm db1.bak

prostrct add db2 addai.st
proutil db2 -C aimage begin
proutil db2 -C enablesitereplication target

#start target and source
_mprosrv db2 -S 4502 -DBService replagent
_mprosrv db1 -S 4501 -DBService replserv

echo "*****************************************"
echo "****  DBs created, Monitoring DB2   *****"
echo "*****************************************"
sleep 5

echo a | rprepl db2 -C monitor 2>/dev/null

echo "*****************************************"
echo "****  FAILING OVER from DB1 to DB2  *****"
echo "*****************************************"
sleep 5
dsrutil db1 -C transition failover

echo "*****************************************"
echo "****  Failed OVER, Monitoring DB1   *****"
echo "*****************************************"
sleep 5
echo a | rprepl db1 -C monitor 2>/dev/null


echo "*****************************************"
echo "****  FAILING BACK from DB2 to DB1  *****"
echo "*****************************************"
sleep 5
dsrutil db2 -C transition failover

echo "*****************************************"
echo "****  Failed BACK, Monitoring DB2   *****"
echo "*****************************************"
sleep 5
echo a | rprepl db2 -C monitor 2>/dev/null
 

Vito

New Member
Shortly, you are right: we should be very careful with Progress, right place and right angle.

Initially it was a little bit wrong combination of parameters and maybe ways of creating a target DB.
Later after I changed them to the proper ones, but I've made a type (databass-role instead of database-role).
Unfortunately Progress doesn't complain about unreadable lines (I double checked all log files).
 
Top