-s revisited

GregTomkins · Jul 30, 2014

I'm familiar with the reasons for -s problems, its default values, why increasing it is often a good idea, memory leaks, problems of failure to delete dynamic objects, MEMPTR, WIDGET-POOL, etc.

All that aside: can anyone explain why one might get multiple -s errors across multiple AppServer/WebSpeed agents at (more or less) the same time? Surely the Progress stack is local to each process. For us this error comes up very rarely but when it does it's always in 'clusters' like this. This server was otherwise stable and other processes running on it had no problem. These are remote agents.

Again, I'm not asking about -s in general, I'm specifically asking about the phenomena of -s errors affecting many processes at the same time.

[14/07/28@13:03:26.874-0700] P-028410 T-000000 1 WS -- (Procedure: 'GetProcedureSignature acai/acgiodx_gs.p' Line:171) SYSTEM ERROR: -s exceeded. Raising STOP condition and attempting to write stack trace to file 'procore'. Consider increasing -s startup parameter. (5635)

[14/07/28@13:04:05.518-0700] P-028418 T-000000 1 WS -- (Procedure: 'GetProcedureSignature mfai/mfgpl_x_gs.p' Line:171) SYSTEM ERROR: -s exceeded. Raising STOP condition and attempting to write stack trace to file 'procore'. Consider increasing -s startup parameter. (5635)

[14/07/28@13:04:20.973-0700] P-028412 T-000000 1 WS -- (Procedure: 'GetProcedureSignature mfai/mfgpl_x_gs.p' Line:171) SYSTEM ERROR: -s exceeded. Raising STOP condition and attempting to write stack trace to file 'procore'. Consider increasing -s startup parameter. (5635)

TheMadDBA · Jul 30, 2014

Using Webspeed agents or stateless appservers means the requests are basically round robin. So commonly used procedures with small memory leaks/growth are likely to happen around the same time... because they have run the same procedure roughly the same number of times (assuming the same startup time).

Or... a larger memory issue gets spread across multiple agents as the user frantically tries the same thing over and over again

GregTomkins · Jul 30, 2014

These agents are running thousands of different procedures in random order depending on what users want to do. They don't even service the same # of requests because some of them might take 50ms and others 10s, also, they start up at different times depending on load, also the nature of the requests vary from a simple one-CHAR output to dozens of gigantic temp-tables... so overall it seems really improbable to me that they would all develop the same memory problem at the same time... unless there is some weird notion of a shared stack somewhere.

TheMadDBA · Jul 30, 2014

No shared stack, each process has its own -s and memory in general (except for the DB shared memory).

There is always just random coincidence

PS - In your example from the logs 2 of the errors are on the same program so there must be some overlap and not just truly random programs. Every system has its more commonly used procedures.

Rob Fitzpatrick · Jul 30, 2014

Moreover, the last two errors are on the same line of the same program. So that would be an interesting place to look. Maybe it's a call to a procedure with a large signature, so it wants to write a lot of data to the stack at once.

And (making a few more assumptions) the first error message comes from a different procedure name, but coincidentally it is in an IP with the same name and it is at the same line number. Maybe acgiodx_gs.p is a copy of mfgpl_x_gs.p, or vice versa? Either way, build your debug-lists and see what that line does, and also what GetProcedureSignature does.

Another approach: set a high stack size (say, 1000) and then run the clients for a while with the -y parameter. Afterwards, look at the stack size usage in client.mon. That's (roughly) how much stack space you need.

Example:

Code:

Memory usage summary:      Current  Max Used  Limit (Bytes)
Stack usage (-s):               60      5696          40960  
Local buffer usage:           1472      5664  
R-code Execution Buffer:    241384    241384        3170304

TomBascom · Jul 30, 2014

My guess is that you have some common code library that everyone runs. Which is pretty much what TheMadDBA and Rob are saying too. Something in that code is sensitive to some global environmental or data related issue. And if it blows up for one session it will blow up for all.

GregTomkins · Jul 30, 2014

"if it blows up for one session it will blow up for all"

Sure. But at the same time? On multiple occassions? Really hard to believe.

TomBascom · Jul 30, 2014

Depends. Maybe the common code is doing something especially stupid. Or subtlety stupid. Or brilliant but with an unexpected side-effect.

For a "stupid" idea maybe it recursively builds a TT of all outstanding order lines and it only blows up when the number gets to a certain depth - but it would happen all at once to everyone.

-s revisited

GregTomkins

Active Member

TheMadDBA

Active Member

GregTomkins

Active Member

TheMadDBA

Active Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

GregTomkins

Active Member

TomBascom

Curmudgeon