Question 32b Progress Above the 4G line

kdefilip

Member
showcfg starts a GUI which shows me this:

Code:
Configuration File:    C:\Intergy\ProRT\PROGRESS.CFG

Company Name:    Microsoft

Product Name:    OE Enterprise RDBMS
Installation Date:    Mon Jul 01 20:01:00 2013
User Limit:    50
Expiration Date:    None
Serial Number:  
Control Numbers:  
Version Number:    11.1
Machine Class:    KB
Port Number:    31

Product Name:    OE Application Svr Ent
Installation Date:    Mon Jul 01 20:01:00 2013
User Limit:    50
Expiration Date:    None
Serial Number:  
Control Numbers:  
Version Number:    11.1
Machine Class:    KB
Port Number:    31

Product Name:    Client Networking
Installation Date:    Mon Jul 01 20:01:00 2013
User Limit:    50
Expiration Date:    None
Serial Number:  
Control Numbers:  
Version Number:    11.1
Machine Class:    KB
Port Number:    31

Mod Edit: Just taking out the control numbers.
Thanks - the things I don't know....
 

TheMadDBA

Active Member
I can only assume you guys didn't have a Progress DBA involved and depended on the vendor for installation and monitoring? Until somebody dumped this on you that is :)

Those error messages in the nameserver log..... did they happen along with a planned shutdown or some kind of crash? There are a few KB entries about these errors relating to older versions of OE, but nothing for your version. If they were part of a planned shutdown (done badly) then I would ignore them.... if not take a look at http://knowledgebase.progress.com/articles/Article/000026491
 

kdefilip

Member
Well the good/bad news is it looks like you need to contact your application vendor..... take a look at http://knowledgebase.progress.com/articles/Article/000036217

Basically they are messing around with DLLs and memory pointers and not doing a very good job of it. Most of the disconnect/terminate messages are cleanup messages from a session going away in an unhandled manner.
I see that, but the process is a different named process. Our failing process is _proappsv.exe. The article references a different process, or would appear to.
 

kdefilip

Member
I can only assume you guys didn't have a Progress DBA involved and depended on the vendor for installation and monitoring? Until somebody dumped this on you that is :)

Those error messages in the nameserver log..... did they happen along with a planned shutdown or some kind of crash? There are a few KB entries about these errors relating to older versions of OE, but nothing for your version. If they were part of a planned shutdown (done badly) then I would ignore them.... if not take a look at http://knowledgebase.progress.com/articles/Article/000026491
This app has been at this site for 3 years and was sold as a "turnkey" install.
 

TheMadDBA

Active Member
I see that, but the process is a different named process. Our failing process is _proappsv.exe. The article references a different process, or would appear to.

The appserver (_proapsv.exe) is basically a "headless" version of the client executable (_prowin32.exe). Knowing what I know of how Progress works I would bet large sums of money that this is the same issue.

The root cause of that error can only be fixed by your vendor or somebody with access to the source code. No flipping of switches or options is going to make that go away.
 

TomBascom

Curmudgeon
400+ hour samples aren't really very informative. Yes, you've got some high counts but the per second averages are pedestrian. There may be periods during the day when activity is much higher and when the samples show more interesting patterns.

You need some decent monitoring tools.

I may be a bit biased but I suggest ProTop... http://dbappraise.com/protop.html
 

kdefilip

Member
400+ hour samples aren't really very informative. Yes, you've got some high counts but the per second averages are pedestrian. There may be periods during the day when activity is much higher and when the samples show more interesting patterns.

You need some decent monitoring tools.

I may be a bit biased but I suggest ProTop... http://dbappraise.com/protop.html
Ahaha, I don't mind bias, it at least shows a person has a passion. Yes, like on day two of my adventure I downloaded it but could not make it work. Threw an error which I tried to track down, but my frustration level at that point was fairly high.
 

kdefilip

Member
400+ hour samples aren't really very informative. Yes, you've got some high counts but the per second averages are pedestrian. There may be periods during the day when activity is much higher and when the samples show more interesting patterns.

You need some decent monitoring tools.

I may be a bit biased but I suggest ProTop... http://dbappraise.com/protop.html
Yes, agreed on the 400+ hours, but taking samples of many durations gives me a fuller idea of what is going on. And when speaking to management, 2.5 million per week sounds a lot more impressive than 50 for 10 seconds :)
 

TheMadDBA

Active Member
Probably best to get an eval version or buy a single user development license. Without that you can't really do much with a lot of the tools (like ProTop). Once you have the developer license you can start looking at the VSTs (virtual system tables) to find out which tables/indexes are the hot spots, see which tables are being updated/locked the most.etc.

Not as nice as the Oracle versions as far tracking down what is going wrong, but still very helpful compared to what you can get through Promon.
 

kdefilip

Member
Probably best to get an eval version or buy a single user development license. Without that you can't really do much with a lot of the tools (like ProTop). Once you have the developer license you can start looking at the VSTs (virtual system tables) to find out which tables/indexes are the hot spots, see which tables are being updated/locked the most.etc.

Not as nice as the Oracle versions as far tracking down what is going wrong, but still very helpful compared to what you can get through Promon.
yes, I miss my V$ tables - I never thought i would say that, but there it is. I have no ability to track down any waits of any kind. I can see them, but can't really follow them to any meaningful conclusion.
 

TheMadDBA

Active Member
I love the V$ tables in Oracle.... I could never give those up. Never been huge on OEM or similar tools, more of a command line guy myself.

Once you get that license and run ProTop or look at the VSTs you will find some interesting info that might be hard for the vendor to dispute... probably excessive reads from suspect tables and locking way more records than you update. The usual suspects.
 

kdefilip

Member
I love the V$ tables in Oracle.... I could never give those up. Never been huge on OEM or similar tools, more of a command line guy myself.

Once you get that license and run ProTop or look at the VSTs you will find some interesting info that might be hard for the vendor to dispute... probably excessive reads from suspect tables and locking way more records than you update. The usual suspects.
Yes, that is my dilemma right now; to be able to see the grass growing, but not be able to see the roots is very frustrating. And I agree on OEM. I use it occasional, but on a whole, I find it clunky and clugy. Although I do find Toad helpful.
 

Cringer

ProgressTalk.com Moderator
Staff member
I was on a DBA training course (DBA Bootcamp) a few weeks ago with a guy who was very much in your shoes. SQL DBA who has been landed with a Progress system to manage. It stood him in very good stead. Maybe it's something you could consider doing too, or getting a consultant in to provide more targeted training. There are both Progress employed and independent consultants who would do a very good job of that. Tom Bascom is one of the independent ones.
 

kdefilip

Member
I was on a DBA training course (DBA Bootcamp) a few weeks ago with a guy who was very much in your shoes. SQL DBA who has been landed with a Progress system to manage. It stood him in very good stead. Maybe it's something you could consider doing too, or getting a consultant in to provide more targeted training. There are both Progress employed and independent consultants who would do a very good job of that. Tom Bascom is one of the independent ones.
Yes, it is something that is on the table now, both training and a consultant to review. Good advise and it is being considered.
 

kdefilip

Member
Basically lruskips just keeps the DB from moving the block to the top of the MRU list every time the block is accessed. Instead a less expensive counter is maintained and when that counter is reached it will move the block to the top of the MRU list. The idea is that if you are hammering the same blocks (common data/index entries) you can get an improvement since the MRU/LRU list can only maintained by one connection at a time.
So I guess what I am struggling with is that no block in our buffer pool appears to ever make it off the LRU list onto our MRU list, given our current settings:

First, our latch timing and activity is disabled which leaves us lacking some data.
Spin timeouts is 100000, I believe much too high for our environment. 10,000 is a better adjustment and probably lower is optimal


♀10/24/14 OpenEdge Release 11 Monitor (R&D)
06:27:34 Adjust Latch Options

1. Spins before timeout: 100000
2. Enable latch activity data collection
3. Enable latch timing data collection
4. Initial latch sleep time: 10 milliseconds
5. Maximum latch sleep time: 250 milliseconds
6. Record Free Chain Search Depth Factor: 5
7. Enable LRU2 alternate buffer pool replacement policy
8. Adjust LRU force skips: 100
9. Adjust LRU2 force skips: 0
 

kdefilip

Member
There are two buffer pools, primary and secondary (aka alternate). As you may know by now their sizes are determined by the -B and -B2 parameters respectively. Each buffer pool has its own LRU chain protected by an LRU latch. Think of them as LRU (primary) and LRU2 (alternate).

Two approaches you can take to relieve contention:
  • Allocate some small, very frequently-accessed objects to the Alternate Buffer Pool and size B2 appropriately so the objects fit entirely within it. In that case, contention is removed from the primary buffer pool LRU chain/latch, and no LRU chain need be maintained, provided that B2 is large enough that no block evictions are necessary.
  • Use the -lruskips parameter to reduce the overhead of maintaining the primary buffer pool LRU chain, and thereby reduce latch contention.
The definitive resource on alternate buffer pool (Tom Bascom, DBAppraise):
"The B2 Buzz"
http://dbappraise.com/ppt/B2Buzz.pptx

More great info on latches, in general (Rich Banville, OE RDBMS architect):
"A New Spin on Some Old Latches"
http://download.psdn.com/media/exch_audio/2008/OPS/OPS-28_Banville.ppt

Info on -lruskips (Rich Banville):
"Still More Database Performance Improvements"
http://pugchallenge.org/2012PPT/NEPUG_Performance.pptx

I notice our server settings appear to be quite a bit different than those suggested in the PP:
♀10/24/14 OpenEdge Release 11 Monitor (R&D)
06:52:25 Server Options

1. Server network message wait time: 2 seconds
2. Delay first prefetch message: Disabled
3. Prefetch message fill percentage: 0 %
4. Minimum records in prefetch message: 16
5. Suspension queue poll priority: 0
7. Terminate a server
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
So I guess what I am struggling with is that no block in our buffer pool appears to ever make it off the LRU list onto our MRU list, given our current settings
I'm not sure what you mean by this. There is only one chain for the primary buffer pool: the LRU chain. It has an "MRU" end, i.e. the end at which blocks are placed when they are accessed, and an "LRU" end, which as the name suggests, is where blocks will be evicted from if there is a need for a free buffer. So if -B is 200,000 and a given block is accessed (placed at the MRU end of the chain), you must access 200,000 other unique blocks before that first block is evicted.

The -lruskips parameter allows you to change the algorithm so it is a combination of least frequently used and least recently used. It doesn't disable or cripple the LRU mechanism, and it doesn't mean blocks can be evicted from anywhere in the LRU chain. It means a block is put back to the MRU end of the chain on every n accesses instead of on every access.
 

kdefilip

Member
I'm not sure what you mean by this. There is only one chain for the primary buffer pool: the LRU chain. It has an "MRU" end, i.e. the end at which blocks are placed when they are accessed, and an "LRU" end, which as the name suggests, is where blocks will be evicted from if there is a need for a free buffer. So if -B is 200,000 and a given block is accessed (placed at the MRU end of the chain), you must access 200,000 other unique blocks before that first block is evicted.

The -lruskips parameter allows you to change the algorithm so it is a combination of least frequently used and least recently used. It doesn't disable or cripple the LRU mechanism, and it doesn't mean blocks can be evicted from anywhere in the LRU chain. It means a block is put back to the MRU end of the chain on every n accesses instead of on every access.

I guess in my mind I conceptualize this as two lists. But I'm okay conceptualizing this as one long list with two distinct ends.
From Progress documentation, in the absence of -lruskips (the default of zero), they state that when a new block gets entered in the pool OR an existing block is accessed, it immediately moves to the top of the MRU. An internal algorithm is used to maintain the other end of the chain, the LRU. All blocks at the end of the LRU chain are candidates for removal/eviction/replacement. By setting the -lruskips to a positive integer, we are essentially messing with that algorithmic system.
So I'm not sure what we are trying to solve by this setting. In my case, no segment of the buffer ever seems to reach 100 skips. So are we trying to solve a buffer that is so small compared to workload that there is contention in latch acquisition. If that is the case, there is a larger problem than skips that should be addressed.
Additionally, since no block ever reaches the threshold of skips, every block in the pool is a candidate for removal, regardless of its access priority.
So I'm just not sure what we are attempting to solve here. Yes, latch contention, but isn't the cause of the latch contention the far greater problem?
 

TomBascom

Curmudgeon
Think about very active blocks.

Suppose that some block is being accessed now and again. Every 1,000 or so times that *any* block is accessed this block will be. So under the old "strict" LRU algorithim it would, at worst, move 1,000 places from the MRU end of the queue.

With -lruskips 100 it could get all the way to position 100,000 before being moved to the MRU end.

Now think about a block in a "rapid reader". One which is being pounded on really hard. 1 in 10 blocks accesses goes to this block. It bounces between first position and position [HASHTAG]#10[/HASHTAG] constantly. With -lruskips 100 it does not get moved to the MRU end nearly as often.

The impact of that is that contention for the LRU latch is greatly reduced.

At the MRU end of the chain it doesn't really matter -- there is no special advantage to being at the front of the line.

At the LRU end of the chain it also doesn't really matter. The LRU end is still ordered the way that we want it to be. It is all stuff that nobody has accessed for a long time.

The great thing is that contention for the LRU latch is 1/100th of what it was.

In many very active Progress databases the LRU latch is the next source of contention once your major IO issues are resolved. So this is a big win for people with well tuned active databases.
 
Top