Shared Memory overflow

Hi All,
Our progress database hangs with the below error.

Out of free shared memory. Use -Mxs to increase. (6495)

Appreciate if you could provide some solutions.

Database is 10.2A and in RHEL 5.8


Thanks
 
Hello Cringer,

Thanks a lot...
1)Progress Version is 10.2A... How to find service pack? I believe no service pack.
2) RHEL 5.8.
3) Tries Increasing -B from 35000 to 100000 and -Mxs to 1024 in Test.
4) Below is the error:

[2014/09/10@00:56:29.206158431205+0800] P-21290 T-1087756608 F SRV 3: (6495) Out of free shared memory. Use -Mxs to increase.
[2014/09/10@00:56:30.-4294967128+0800] P-21290 T-1087756608 I SRV 3: (7132) dsmUserDisconnect called for invalid user. rtc: -20031.
[2014/09/10@00:56:30.168+0800] P-21290 T-1087756608 I SQLSRV2 : (-----) Error in rss_cleanup: error code 4294947265 was returned by dsmUserDisconnect

Thanks and Regards,

Surya
 

TomBascom

Curmudgeon
The message provides the solution -- increase -Mxs... the previous value can be found in the .lg file:

(4240) Excess Shared Memory Size (-Mxs): 96.

Generally speaking this error occurs because, over time, you have users that have been unable to get a lock from the lock table (-L -- upper case) and have had their session crash with a "-L exceeded" message. When that happens the db uses spare memory in -Mxs to expand the -L table slightly. If it happens often enough you run out and then the db crashes.

The sessions that get the "-L exceeded" messages are not necessarily doing anything wrong -- they might be, or they might just be unlucky victims of someone else hogging unreasonable numbers of locks. If -L is unreasonably small:

Current Size of Lock Table (-L): 8192.

then you can try making it larger. That might be all that you need to do. But if it is already large the problem is more likely that some bit of code is requesting more locks than is reasonable. Oftentimes that is due to the coder being confused about when it is appropriate to use a database transaction to enforce business rules. A very common bad practice is to use database transactions to enforce "all or nothing" for purge processes or end of period updates. That works very well on a development or a demo system but it is a scalability disaster when a customer has millions of records to be purged or updated.

For reference -- IMHO as a rule of thumb "reasonable" is somewhere around 100 locks per user. Anything more than that and you're being a resource hog with poor coding practices.
 

TheMadDBA

Active Member
You can get the SP information (if any) from looking at $DLC/version.

How much memory does the linux box have?

What is your database block size?

What are the values of shmmax and shmmni?
 
Hi Tom/TheMadDBA,

SP is as below:
OpenEdge Release 10.2A as of Fri Oct 31 20:06:43 EDT 2008

The Linux box has 24GB memory.
Block size is 8192K
kernel.shmmax = 68719476736
kernel.shmmni=4096

One of the database had shutdown abnormally due to -L parameter exceeding and leading to -Mxs exhaust.

But most of the cases, the database is hung. We can see it UP in proutil -C dbipcs but hangs when we try to promon or mpro or connect using MFG/PRO...

Thanks a lot...

Waiting for your favorable solutions...

Surya
 

TheMadDBA

Active Member
Ahh... MFG/PRO.. with no service pack. I would suggest trying to get to the latest version of Progress that your version of MFG/PRO supports. Lots and lots of bug fixes between 10.2A and 10.2B07.

Like Tom said you are going to need to crank up the value of -L and stop letting it hit overflows like that. Unless you are modifying the source... MFG/PRO in most versions is going to have a lot of lock table issues.

How many databases are you running on this box?

I see SQL92 access, what is your isolation level set to? The wrong levels can cause a ton of extra locks. Also make sure your statistics are current.
 

TomBascom

Curmudgeon
I think you have two different things going on.

A db crash due to -L being exceeded and -Mxs being exhausted is a reasonably well known thing. You fix it as has already been mentioned.

A database being frequently "hung" is a different cup of tea and, unless you are saying that there is always a -Mxs related message in the .lg file when you have a hung database then that should be treated as a distinct problem.
 

TomBascom

Curmudgeon
The fact that you are using dbipcs disturbs me. Many of the people that I run across who do that also have some bad habits related to "kill -9". They have learned to use dbipcs because they frequently kill databases and have to clean up the mess. Is that what you are doing?
 
Ahh... MFG/PRO.. with no service pack. I would suggest trying to get to the latest version of Progress that your version of MFG/PRO supports. Lots and lots of bug fixes between 10.2A and 10.2B07.

Like Tom said you are going to need to crank up the value of -L and stop letting it hit overflows like that. Unless you are modifying the source... MFG/PRO in most versions is going to have a lot of lock table issues.

How many databases are you running on this box?

I see SQL92 access, what is your isolation level set to? The wrong levels can cause a ton of extra locks. Also make sure your statistics are current.

Hi TheMadDBA,

We are running 19 databases the biggest ones being 124GB each.

How to check isolation level for SQL92 access?

Service Pack is one of the suggestions I am looking into.

Thanks a lot for prompt responses..
 
The fact that you are using dbipcs disturbs me. Many of the people that I run across who do that also have some bad habits related to "kill -9". They have learned to use dbipcs because they frequently kill databases and have to clean up the mess. Is that what you are doing?

Hi Tom,

Once upon a time, yes we used kill -9 and we had to perform dump and load from a corrupted database. So we have avoided kill -9.

BTW, history is that we got shared memory overflow issue in one 124GB and one 13GB database. In both cases, the database would show in proutil -C dbipcs but would not allow connections. We could not also use proshut -by and proshut -F. The database did not come down.

Ultimately, we had to reboot the server running 19 databases and we have been praticing the same if the log shows -Mxs issues. We are planning to start databases with -Mxs 1024.
As TheMadDBA suggested, what would be your suggestion? BTW, we are using to QAD EE and moving out of MFG/PRO sooner or later.

Thanks a lot.
Surya
 
Are we talking about a single event where you had an issue?

Or many events?

Hi Tom,

Many events but not frequent. May be once in 40 days. BTW, we reboot server once a month to prevent such issues but in vain. The worry is if database crashes and can not come up and there is loss of data as well as availability. Usually if the database is hung which is not reflected in monitoring tool, so it goes unnoticed for quite sometime until users start complaining...

BTW, we are not running watchdog... Is it necessary and does it help such situations?

BTW, Any idea of MFG/PRO programs need to be recompiled when we upgrade from 10.2A ro 10.2B?

Thanks and lot,

Surya
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Given the size of your databases I assume you are using Enterprise RDBMS licenses, not Workgroup.

Yes, you should always run a watchdog but its function is to clean up after dead shared-memory clients, not to prevent a database hang. Also, you should always run a BIW, AIW, and at least one APW.

Since you mention potential data loss: are you using after imaging?
 

TomBascom

Curmudgeon
R-code is upwardly compatible. So, strictly speaking, you do not need to recompile going from 10.2a to 10.2b.

But if you have the ability to compile code then there is no reason to compile anything less than everything. What possible reason would you have for picking and choosing?
 

TomBascom

Curmudgeon
As Rob says -- if you are worried about database integrity you need to have after-imaging enabled. Actually if you are not worried about it you should have ai enabled. Running without after-imaging is irresponsible.
 

TheMadDBA

Active Member
Run watchdog for sure. Once upon a time we were trying to recreate a database crash issue to prove that the Unix admins were causing the database shutdowns by killing processes.

In the test database the watchdog was not running (by accident) and the database never crashed but we did get it to hang for new connections and do crazy things for existing connections. Once we started watchdog and repeated the tests the database shut down like we would expect when a process was killed in an ugly way.
 
Top