Shared Memory - in use

zygolux

New Member
Hello all,
When I run the command proutil -C dbipcs to get the shared memory segments used by my DBs, I get some lines where the fields "in use" is set to "No" ...
I would like to understand why I have this situation as it seems to me it is one of the reasons why I cannot connect to a DB. Can anyone bring some light ?
Plus, if I stop my DB, remove this segment and start my DB again, there is another segment created (different ID) which is still not "in use".
So far, the only option I found (but I am sure that's not the way to proceed) is to rebuild my DB completely ...
Thanks for your help :)
Zyg.
 

TomBascom

Curmudgeon
You might want to supply a bit more information. What OS, what version of Progress and maybe even the output of proutil as well as of the OS "ipcs -m" command. You might also want to provide more information about what errors you're getting that prevent you from connecting. DB startup options would also be helpful.

In general deleting shared memory segments is not needed. If it is then something is wrong.

Also, what do you mean by "rebuild my database". That sounds rather extreme.
 

zygolux

New Member
Hi Tom,

Sorry for the lack of information ...
I am running on Linux (Suse 10) and Progress 10.0BSP5.

The issue is gone now with the rebuild. Which is extreme I agree, but the db was not a production one and at least a rebuild solved the issue.

for more details, ipcs gave this result :
mfgadmin@lulinux03:/home/mfgadmin> proutil -C dbipcs | egrep lutpdmfg
402522114 10042 0 No /dbdata/tpd/lu/lutpdmfg.db
402849795 10042 0 Yes /dbdata/tpd/lu/lutpdmfg.db
402882564 10042 1 - /dbdata/tpd/lu/lutpdmfg.db

The segment with "No" did sound very strange to me ... and this segment did not disapear when I shut down the db. This is a first thing I don't really understand.
Then I remember I shut down the db, killed this remaining segment and started the db again. And a new segment with the status in use "no" was there again.

I am sorry but I don't remember the exact error message when I tried to connect to this DB. There was something like "no active server" ...

Be sure next time it happens (hopefully not on a live db), I will post the exact error message.

Thanks
 

TomBascom

Curmudgeon
Well since we're speculating without any real data ;) I'll speculate that the trouble began with a kill -9 of a running db...
 

zygolux

New Member
Hum ... so I think the best is to wait it happens again :cool:
Anyway, is it a normal situation to have a shared-memory segment which has "no" for field "in use" ?
 

TomBascom

Curmudgeon
"No" means that no process has it attached. The easiest way for that to happen is "kill -9". That prevents processes from cleaning up resources, including shared memory, that they have allocated.

Shutting down the db won't remove it as it has nothing to do with any active db server. Any current server knows nothing about it. It was created by some other server.

Rebuilding the db (whatever that means) shouldn't have anything to do with it going away either. The segment would go away after an ipcrm or an OS reboot.

I suppose that a bug in either the OS or Progress might also cause it. But the simplest explanation is kill -9 or a very severe db crash (but a db crash would leave evidence in the .lg file).
 

zygolux

New Member
Hello,

I am back with the same issue ... and more details.

The error message I get when I try to connect to my DB is:

mfgadmin@lulinux03:/dbdata/tpd/lu> promon /dbdata/tpd/lu/lutpdmfg
semGetId: Second call to dbMSG008
There is no server for database /dbdata/tpd/lu/lutpdmfg. (1423)

And that's the shared memory segments for this DB:

mfgadmin@lulinux03:/dbdata/tpd/lu> proutil -C dbipcs | egrep lutpdmfg
439681125 10042 0 No /dbdata/tpd/lu/lutpdmfg.db
449610017 10042 0 No /dbdata/tpd/lu/lutpdmfg.db

thanks for your help.
 

TomBascom

Curmudgeon
PROMON says that there is no server.

dbipcs confirms that and shows 2 dead shared memory segments -- which looks, to me, like two killed servers.

Are these databases crashing?

If they aren't crashing what process do you use to shut them down?

What does the tail of the db .lg file look like when you get the error above with PROMON?

What does proserve /dbdata/tpd/lu/lutpdmfg result in?
 

zygolux

New Member
These are the last lines of my log ...

I need to mention that this issue occured this morning and as usual the users would like to have it fixed asap. Then I contacted Progress support and they mentionned the DB was corrupted, and the solution was to restore it. Then I asked for a less extreme solution, so they asked me to run proutil database-name -C truncate bi -F.
So I did this and the DB seemed to be OK after.
But as I understood, it could be that there are still some corrupted records here and there and sooner or later, if someone tries to access these corrupted records, I will end up to the same issue.
Anyway, they are working on the log and I am waiting for their advices.
I also have to mention that this DB is rebuilt everynight, means in about 6 hours (my time). I expect tomorrow will be another day ...

This issue is very strange as I have about 20 DBs of the same structure and the issue always occurs with this one. According to the log, it is clear the issue happens after the restore during the night - but not everynight ...


Tue Aug 5 03:52:36 2008
03:52:36 BROKER 0: Multi-user session begin. (333)
03:52:36 BROKER 0: Begin Physical Redo Phase at 448 . (5326)
03:52:37 BROKER 0: Physical Redo Phase Completed at blk 332 off 2069 upd 5020. (7161)
03:52:37 BROKER 0: Begin Physical Undo 1 transactions at block 332 offset 2097 (7163)
03:52:37 BROKER 0: Physical Undo Phase Completed at 332 . (5331)
03:52:37 BROKER 0: Begin Logical Undo Phase, 1 incomplete transactions are being backed out. (7162)
03:52:37 BROKER 0: Logical Undo Phase begin at Block 332 Offset 2069. (11231)
03:52:37 BROKER 0: SYSTEM ERROR: rlrdprv: There are no more notes to be read. (865)
03:52:37 BROKER 0: SYSTEM ERROR: The broker is exiting unexpectedly, beginning Abnormal Shutdown. (5292)
03:52:37 BROKER 0: drexit: Initiating Abnormal Shutdown
03:52:37 BROKER 0: ** Save file named core for analysis by Progress Software Corporation. (439)
03:52:37 BROKER 0: SYSTEM ERROR: Releasing regular latch. latchId: 4 (5028)
03:52:37 BROKER 0: User 0 died holding 1 shared memory locks. (2522)
Tue Aug 5 09:51:09 2008
09:51:09 BROKER 0: Multi-user session begin. (333)
09:51:08 BROKER 0: Begin Physical Redo Phase at 384 . (5326)
09:51:15 BROKER 0: Physical Redo Phase Completed at blk 332 off 2069 upd 5205. (7161)
09:51:15 BROKER 0: Begin Physical Undo 1 transactions at block 332 offset 2097 (7163)
09:51:15 BROKER 0: Physical Undo Phase Completed at 332 . (5331)
09:51:15 BROKER 0: Begin Logical Undo Phase, 1 incomplete transactions are being backed out. (7162)
09:51:15 BROKER 0: Logical Undo Phase begin at Block 332 Offset 2069. (11231)
09:51:16 BROKER 0: SYSTEM ERROR: rlrdprv: There are no more notes to be read. (865)
09:51:16 BROKER 0: SYSTEM ERROR: The broker is exiting unexpectedly, beginning Abnormal Shutdown. (5292)
09:51:16 BROKER 0: drexit: Initiating Abnormal Shutdown
09:51:15 BROKER 0: ** Save file named core for analysis by Progress Software Corporation. (439)
09:51:15 BROKER 0: SYSTEM ERROR: Releasing regular latch. latchId: 4 (5028)
09:51:15 BROKER 0: User 0 died holding 1 shared memory locks. (2522)
Tue Aug 5 10:51:47 2008
10:51:47 proutil -C truncate bi session begin for mfgadmin on /dev/pts/0. (451)
10:51:47 ** The FORCE option was given, database recovery will be skipped. (33)
10:51:47 ** Your database was damaged. Dump its data and reload it. (37)
10:51:48 .bi file truncated. (123)
10:51:48 proutil -C truncate bi session end. (334)
 

TomBascom

Curmudgeon
Once again the term "rebuild" is mysterious. What, exactly, do you mean by that?

Your actual problem is:
03:52:37 BROKER 0: SYSTEM ERROR: rlrdprv: There are no more notes to be read. (865)
03:52:37 BROKER 0: SYSTEM ERROR: The broker is exiting unexpectedly, beginning Abnormal Shutdown. (5292)
03:52:37 BROKER 0: drexit: Initiating Abnormal Shutdown
03:52:37 BROKER 0: ** Save file named core for analysis by Progress Software Corporation. (439)
03:52:37 BROKER 0: SYSTEM ERROR: Releasing regular latch. latchId: 4 (5028)
03:52:37 BROKER 0: User 0 died holding 1 shared memory locks. (2522)

This means that some bad thing happened to your BI file. That might have something to do with whatever the mysterious "rebuild" process is.

The dbipcs stuff is a side effect of this -- your db crashed and was unable to cleanup shared memory. The log file tells us why it crashed. What we don't know is why it "ran out of notes". I suspect that the magic "rebuild" is involved. If this process somehow involves a copy of a source database then perhaps the source db is either being incompletely copied, incorrectly copied, is being corrupted during the copy or is itself corrupt at the source. But without knowing what you are doing in any detail it is difficult to do anything other than wildly speculate.

Progress is right -- your database is corrupt and the right thing to do would be to restore and roll forward. But I'll go out on a limb and guess that you don't have after-imaging enabled :rolleyes: The -F option basically throws away the notes in the BI file which means that you are almost certainly throwing away data. There is no way to know what data you are missing and your db is not reliable.
 

zygolux

New Member
The rebuild is : recreate the DB structure (from the production DB structure) + restore data (from the prodcution DB). Actually this DB is a copy (day-1) of the production DB. This is why it is rebuilt (Re-created if you prefer) every night ...
 

TomBascom

Curmudgeon
The rebuild is : recreate the DB structure (from the production DB structure) + restore data (from the prodcution DB). Actually this DB is a copy (day-1) of the production DB. This is why it is rebuilt (Re-created if you prefer) every night ...

What commands do you use to execute that process?
 

ortega

New Member
Hi I have a problem that looks like
but Im using Windows and this hapend when proshut comand is completed, there are some diferent way to release the shared memory segment? where not need to shutdown the server or probably avoid this situation without chage of SO ?
 
Top