Question 32b Progress Above the 4G line

kdefilip

Member
What to do about this depends on which latch or latches show contention.
See below:
NOTE: *** I/O timing is not enabled. ***

♀10/23/14 Activity: Latch Counts
07:58:14 10/05/14 11:27 to 10/23/14 07:57 (428 hrs 31 min)

----- Locks ----- ------ Busy ------ Naps ---------- Spins ----------- ----- Na
Max -----
Owner Total /Sec /Sec Pct /Sec /Sec /Lock /Busy Total
HWM

MTX -- 486196434 315 0 0.0 1 0 0 0 0 0
USR -- 567916810 368 0 0.0 0 0 0 0 0 0
OM -- 25693552830 16655 0 0.0 0 0 0 0 0 0
BIB -- 605749120 392 0 0.0 0 0 0 0 0 0
SCH -- 21304800 13 0 0.0 0 0 0 0 0 0
LKP -- 1400928027 908 0 0.0 0 0 0 0 0 0
GST -- 328807 0 0 0.0 0 0 0 0 0 0
TXT -- 356911559 231 0 0.0 0 0 0 0 0 0
LKT -- 1716548720 1112 0 0.0 0 0 0 0 0 0
LKT -- 1877837611 1217 0 0.0 0 0 0 0 0 0
LKT -- 1522060612 986 0 0.0 0 0 0 0 0 0
LKT -- 1572079335 1019 0 0.0 0 0 0 0 0 0
SEQ -- 4948253 3 0 0.0 0 0 0 0 0 0
AIB -- 527881663 342 0 0.0 0 0 0 0 0 0
TXQ -- 626367600 406 0 0.0 0 0 0 0 0 0
EC -- 0 0 0 0.0 0 0 0 0 0 0
LKF -- 4316310416 2797 0 0.0 0 0 0 0 0 0
BFP -- 403637787 261 0 0.0 0 0 0 0 0 0
BHT -- 47171345038 30578 0 0.0 0 0 0 0 0 0
PWQ -- 9164695 5 0 0.0 0 0 0 0 0 0
CPQ -- 189623611 122 0 0.0 0 0 0 0 0 0
LRU -- 18762062043 12162 0 0.0 0 0 0 0 0 0
LRU -- 0 0 0 0.0 0 0 0 0 0 0
BUF -- 23942784344 15520 0 0.0 0 0 0 0 0 0
BUF -- 23755477527 15399 0 0.0 0 0 0 0 0 0
BUF -- 22537106234 14609 0 0.0 0 0 0 0 0 0
BUF -- 22834876985 14802 0 0.0 0 0 0 0 0 0
INC -- 0 0 0 0.0 0 0 0 0 0 0
L29 -- 0 0 0 0.0 0 0 0 0 0 0
L30 -- 0 0 0 0.0 0 0 0 0 0 0
L31 -- 0 0 0 0.0 0 0 0 0 0 0
 

kdefilip

Member
Not sure quite what you mean by this.
Disk alignment goes to block/sector 1 of a disk when it is formatted. Disk alignment is a very complicated proposition. If your disks are misaligned every subsequent sector written to will be "offset" creating unnecessariy I/O, sever in some cases. Pre win2k8, MS defaulted this operation at, I belive, 16k (bad in all cases). Due to the fact that many were ignoring this, many systems all over the world had misaligned disks, not a huge problem, unless you are running a DB on those disks or some other high impact system. To counter this, starting with Win2k8, MS begain defaulting disk alignment at 1M. A catch-all, but by no means optimal. Particularly if you are on an EMC san where you can't control strip size. If you are on an EMC san, optimal disk alignment should be set at 64k to insure sector alignment. All of our disks are aligned at 1024 or 1M. Our C: drive is aligned at 16k. You can observe the offset on a Win box by running diskpart in a command prompt. This article kind of explains the problem, and I agree with it except that the MS default is 1024. It works for all, but is not great for many. EMC's best practice is disk alignment at 64k
http://www.blueshiftblog.com/?p=300
and if you're really interested:
http://blogs.msdn.com/b/jimmymay/ar...ignment-make-the-case-with-this-template.aspx
 

kdefilip

Member
More relevant conference goodness, from PUG Challenge Americas 2013:
Rich Banville
Database I/O Mechanisms
slides: http://pugchallenge.org/downloads2013/321_Database_IO.pptx
audio: http://pugchallenge.org/downloads2013/audio/321_Database_IO.mp3

Next year, be sure to get yourself to Westford, Mass. for PUG Challenge 2015. It's the best Progress technical conference there is. I'll buy you a beer and toast your success at resolving this issue. :)
You're on!
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
OK, thanks for the info on disk alignment. For some reason I though you were on direct-attached storage. So your back end for this server is an EMC SAN?
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Reposting the latch info in code tags and sorted, to help readability.
Code:
10/23/14 Activity: Latch Counts
07:58:14 10/05/14 11:27 to 10/23/14 07:57 (428 hrs 31 min)

                                Locks                   Busy         Naps         Spins                Nap max   
latch    owner            total        /sec        /sec      pct     /sec     /sec    /lock    /busy    total     HWM
BHT       --     47,171,345,038      30,578           0        0        0        0        0        0        0        0
OM        --     25,693,552,830      16,655           0        0        0        0        0        0        0        0
BUF       --     23,942,784,344      15,520           0        0        0        0        0        0        0        0
BUF       --     23,755,477,527      15,399           0        0        0        0        0        0        0        0
BUF       --     22,834,876,985      14,802           0        0        0        0        0        0        0        0
BUF       --     22,537,106,234      14,609           0        0        0        0        0        0        0        0
LRU       --     18,762,062,043      12,162           0        0        0        0        0        0        0        0
LKF       --      4,316,310,416       2,797           0        0        0        0        0        0        0        0
LKT       --      1,877,837,611       1,217           0        0        0        0        0        0        0        0
LKT       --      1,716,548,720       1,112           0        0        0        0        0        0        0        0
LKT       --      1,572,079,335       1,019           0        0        0        0        0        0        0        0
LKT       --      1,522,060,612         986           0        0        0        0        0        0        0        0
LKP       --      1,400,928,027         908           0        0        0        0        0        0        0        0
TXQ       --        626,367,600         406           0        0        0        0        0        0        0        0
BIB       --        605,749,120         392           0        0        0        0        0        0        0        0
USR       --        567,916,810         368           0        0        0        0        0        0        0        0
AIB       --        527,881,663         342           0        0        0        0        0        0        0        0
MTX       --        486,196,434         315           0        0        1        0        0        0        0        0
BFP       --        403,637,787         261           0        0        0        0        0        0        0        0
TXT       --        356,911,559         231           0        0        0        0        0        0        0        0
CPQ       --        189,623,611         122           0        0        0        0        0        0        0        0
SCH       --         21,304,800          13           0        0        0        0        0        0        0        0
PWQ       --          9,164,695           5           0        0        0        0        0        0        0        0
SEQ       --          4,948,253           3           0        0        0        0        0        0        0        0
GST       --            328,807           -           0        0        0        0        0        0        0        0
EC        --                  -           -           0        0        0        0        0        0        0        0
LRU       --                  -           -           0        0        0        0        0        0        0        0
INC       --                  -           -           0        0        0        0        0        0        0        0
L29       --                  -           -           0        0        0        0        0        0        0        0
L30       --                  -           -           0        0        0        0        0        0        0        0
L31       --                  -           -           0        0        0        0        0        0        0        0
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Based on the latch counts OM is very active, though no contention. If the number of entries in the object mapping cache (specified by -omsize) exceeds the number of records in the _Storageobject system table then access to cache entries is not latched.

This looks like SQL but thanks to the legacy SQL-89 interface you can run it in an ABL procedure editor:
select count(*) from _storageobject

If -omsize is set higher than that number then you'll eliminate that OM latch activity.
 

TheMadDBA

Active Member
Primarily, what we are seeing mostly on the client side is very many -l (little L) which I believe are grabs for more temp space/file at the client. This in and of itself is not the real issue, I don't believe. So what we see is this. ClientOne with begin to gram more and more temp in increments of 10 (starting at what appears to be a devault of 200). It will raise up, sometimes into the 300's and sometimes up to the high 500's. In any event, at some point, you will see clientOne stop grabbing temp space, a short pause in time, One or more OTHER clients will crash, another short pause, then ClientOne begins to grab more temp space.... -l 450 to 460, 460 to 470, etc. Now obviously temp space on one client can not cause another client across town to crash. However, I believe the root of the problem is whatever ClientOne is doing on the backend, long running query or some other issue, that is then exhausing some resource causing one and sometimes multiple clients to crash. Note: This is ALL guess at this point.

-l is for local buffers (in KB increments) not temporary space. In poorly written applications you will see these messages in the log as memory usage increases. By themselves the WARNING messages in the logs aren't really an issue. If you want the messages to go away you can adjust the -l startup to some larger value (default is 200), just don't go overboard.

Do you see other error messages somewhere that lead you to believe a crash is happening?
 

TheMadDBA

Active Member
Our lruskips are set at 100

Based on the log you provided it was set to 0... did you change this (and spin) through promon or the parameter files/startup? Keep in mind anything changed in promon doesn't persist when the database is restarted and anything changed in the startup parameters/parameter files don't take effect until the database is restarted.
 

kdefilip

Member
Based on the log you provided it was set to 0... did you change this (and spin) through promon or the parameter files/startup? Keep in mind anything changed in promon doesn't persist when the database is restarted and anything changed in the startup parameters/parameter files don't take effect until the database is restarted.

Just did another check. This is what promon is showing me now and I don't believe we are running a parameter file.
Enter a number, <return>, P, T, or X (? for help): 4

♀10/23/14 OpenEdge Release 11 Monitor (R&D)
13:06:52 Adjust Latch Options

1. Spins before timeout: 100000
2. Enable latch activity data collection
3. Enable latch timing data collection
4. Initial latch sleep time: 10 milliseconds
5. Maximum latch sleep time: 250 milliseconds
6. Record Free Chain Search Depth Factor: 5
7. Enable LRU2 alternate buffer pool replacement policy
8. Adjust LRU force skips: 100
9. Adjust LRU2 force skips: 0

EDIT: No, I have made no changes, but can't guarantee anyone else hasn't.
 

TheMadDBA

Active Member
Either you are running a parameter file or the parameters are just hard coded in the startup command line. Where did you change the spin and lruskips values? If just through promon that will all go away with the next restart.

I wish Progress had a SPFILE concept along with the ALTER concept to change the current/stored values but it is a little more low tech than that. Depending on how the database startup/shutdown is handled it could be stored in a number of places. You will need to track that part down.
 

kdefilip

Member
Based on the log you provided it was set to 0... did you change this (and spin) through promon or the parameter files/startup? Keep in mind anything changed in promon doesn't persist when the database is restarted and anything changed in the startup parameters/parameter files don't take effect until the database is restarted.
-l is for local buffers (in KB increments) not temporary space. In poorly written applications you will see these messages in the log as memory usage increases. By themselves the WARNING messages in the logs aren't really an issue. If you want the messages to go away you can adjust the -l startup to some larger value (default is 200), just don't go overboard.

Do you see other error messages somewhere that lead you to believe a crash is happening?

Yes, many windows crash logs on the DB/app server at C:\ProgramData\Microsoft\Windows\WER\ReportArchive
Many, Many SRV 116: (794) Usernum 991 terminated abnormally (usernum varies)
Many WDOG 189: (2527) Disconnecting dead user 478 (user varies)
Many An existing connection was forcibly closed by the remote host like the below:
Code:
59916 [2014-10-02T07:37:44.4053650-04:00]--1:1752:Unable to retrieve service assignments. Progress.Open4GL.Exceptions.NameServerCommunicationsException: Connect Failure: NameServer Communications Failure - System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host

Many, many OS event logs on db/app server shown below:
Code:
Faulting application name: _proapsv.exe, version: 11.1.0.1143, time stamp: 0x527a2628

Faulting module name: ntdll.dll, version: 6.1.7601.18247, time stamp: 0x521ea8e7

Exception code: 0xc0000374

Fault offset: 0x000ce753

Faulting process id: 0x73d8

Faulting application start time: 0x01cfeb05e01e9af7

Faulting application path: C:\Intergy\ProRT\bin\_proapsv.exe

Faulting module path: C:\Windows\SysWOW64\ntdll.dll

Report Id: 1f8c7907-56f9-11e4-b7f7-d485647633e0
 

kdefilip

Member
Either you are running a parameter file or the parameters are just hard coded in the startup command line. Where did you change the spin and lruskips values? If just through promon that will all go away with the next restart.

I wish Progress had a SPFILE concept along with the ALTER concept to change the current/stored values but it is a little more low tech than that. Depending on how the database startup/shutdown is handled it could be stored in a number of places. You will need to track that part down.
I believe they changed that on command line as no reboot has occurred since 10/5, it was change on 10/13.

and yes, I hear you. In oracle you can change most parameters command line but use the SCOPE=BOTH to change it in memory AND in the parameter file. And in our case here, I've seen reference to no parameter file on this Progress system.
 

kdefilip

Member
Progress executables, like any other Windows executables, don't allocate RAM or choose physical memory addresses. They allocate virtual memory and the OS memory manager determines whether the pages are memory-resident. 32 bits = 2^32 bytes = 4 GB of virtual memory address space for a 32-bit process; 2 GB for the OS and 2 GB for the user-mode code and data. Given the code, shared libraries, thread stack, heap, etc. that already reside in the user portion of the database broker's virtual memory address space, the practical limit of the size of shared memory is somewhere around 1.5 GB, but it will vary depending on your system. If you want to allocate larger shared memory structures, like buffer pool (-B) and lock table (-L), you need to run the 64-bit Enterprise RDBMS product rather than 32-bit.

I agree that that is not a good situation, and you can change it for the better. OpenEdge is indeed widely used, but as Tom has indicated most customers no longer face these constraints as they have moved their back end to 64-bit executables.

Unfortunately one legitimate limitation of OpenEdge (though only on Windows) is that you cannot have both a 32-bit installation and a 64-bit installation of products in the same release, e.g. a 64-bit database and a 32-bit runtime. But you certainly can move your application tier to a different box. So you want the 32-bit executables for the client side (Client Networking or AppServer for example; your vendor will tell you) and 64-bit executables on the server side (Enterprise RDBMS and, likely, 4GL Development System, aka the compiler). With proper tuning of the clients, the database brokers, and the network path(s) for TCP connections instead of shared memory connections, you can make the clients perform well and eliminate most of the performance hit of connecting client/server. Though I suspect you may have larger bottlenecks at the moment.

A remote (TCP) client neither knows nor cares whether the database it connects to is running with 32-bit or 64-bit executables, so splitting the tiers isn't something that should cause you application compatibility issues unless you have clients that for some reason "must" run on the database server. But do talk to your vendor about concerns and your plans for addressing them.


I'm not sure what number you are quoting but 1046 is not a Progress database block size. And as Tom said, this has nothing to do with your -B constraint. Shared memory (most of which is -B, usually) is constrained in that its segments must fit (be able to be mapped into) the free contiguous blocks of virtual memory in the address space of all processes that connect to it. Your limiting factor is the use of 32-bit executables to create and connect to shared memory.

On any modern OS the file system block size is at least 4 KB. It is 4 KB by default in NTFS. So if your database block size is less than that you will pay a severe performance penalty for having a small block size. The only way to change it is with a dump and load to create a new database. If you do this, I recommend 8 KB blocks.

In a "proenv" command prompt (find it in the OpenEdge group in the Start menu), navigate to the database directory and type:
proutil dbname -C describe

That will show you your database block size. If it isn't at least 4 KB then that is likely your biggest performance issue.

Can someone interpret this for me: particularlly the line: 9 64 Bit DBKEYS Yes

Code:
c:\Intergy\DB>proutil medman -C describe
OpenEdge Release 11.1.0.037  as of Wed Nov  6 06:01:43 EST 2

OpenEdge Database Description

Database Name  : C:\Intergy\DB\medman
Version  : 173.0
Block Size  : 4096
Largest Cluster  : 512
Create Date  : Mon Jul 27 16:29:38 2009
Last Open Date  : Sun Oct 05 11:27:15 2014
Prior Open Date  : Sun Oct 05 11:27:15 2014
Schema Change Date  : Sun Aug 24 08:04:15 2014

Before Imaging information
  Block Size  : 8192
  Cluster Size (16K Units)  : 1024
  Last Open Date  : Sun Oct 05 11:27:19 2014

After Imaging Information
  Block Size  : 8192
  Begin Date  : Sun Aug 24 08:04:42 2014
  Last AIMAGE NEW  : Wed Oct 22 22:00:01 2014
  Currently Busy Area  : 12
  Current File Number  : 62

Backup Information
  Last Full Backup Date  : Wed Oct 22 22:00:01 2014
  Last Incremental Backup  : *** Not yet performed ***

Database Features

  ID  Feature  Active  Details
  ----  ---------------------------------  ------  -------
  5  Large Files  Yes
  9  64 Bit DBKEYS  Yes
  10  Large Keys  Yes
 

TheMadDBA

Active Member
For the features section....

Large files means your database can support extents over 2GB in size.

64 bit DBKeys means the ROWID/RECIDs (basically the same as Oracle ROWIDs) support 64 bit entries, meaning you can store a lot more data than previous (ancient) versions.

Large Keys - Progress used to have a much smaller limit on the total length of data in a single index entry. This means you can support larger (usually character) values in the database.

None of this means you are or aren't running a 64 bit server though... the database doesn't really care and most of these features came way before 64 bit products were available.

Run showcfg from that same prompt (and trim out the serial numbers/control numbers).
 

kdefilip

Member
Run showcfg from that same prompt (and trim out the serial numbers/control numbers).
showcfg starts a GUI which shows me this:

Code:
Configuration File:    C:\Intergy\ProRT\PROGRESS.CFG

Company Name:    Microsoft

Product Name:    OE Enterprise RDBMS
Installation Date:    Mon Jul 01 20:01:00 2013
User Limit:    50
Expiration Date:    None
Serial Number:   
Control Numbers:   
Version Number:    11.1 
Machine Class:    KB
Port Number:    31

Product Name:    OE Application Svr Ent
Installation Date:    Mon Jul 01 20:01:00 2013
User Limit:    50
Expiration Date:    None
Serial Number:   
Control Numbers:   
Version Number:    11.1 
Machine Class:    KB
Port Number:    31

Product Name:    Client Networking
Installation Date:    Mon Jul 01 20:01:00 2013
User Limit:    50
Expiration Date:    None
Serial Number:   
Control Numbers:   
Version Number:    11.1 
Machine Class:    KB
Port Number:    31

Mod Edit: Just taking out the control numbers.
 
Last edited by a moderator:

kdefilip

Member
That looks like some AppServer process isn't running correctly. Is the NameServer running correctly? Anything in the NameServer log? C:\Openedge\Wrk\xxx.ns.log
Only log I can find even close to that is:
NS1.ns.log with no entries since last DB bounce:

Code:
[14/10/02@07:26:45.718-0400] P-010988 T-NS1 1 NS NSPlumbing An error occured while listening for network input requests on port 5162. (9362)
[14/10/02@07:26:45.718-0400] P-010988 T-NS1 1 NS NSPlumbing An exception occured. Error message: socket closed. (9411)
[14/10/02@07:26:46.718-0400] P-010988 T-NS1 1 NS NSPlumbing An error occured while listening for network input requests on port 5162. (9362)
[14/10/02@07:26:46.718-0400] P-010988 T-NS1 1 NS NSPlumbing An exception occured. Error message: socket closed. (9411)
[14/10/02@07:26:47.718-0400] P-010988 T-NS1 1 NS NSPlumbing An error occured while listening for network input requests on port 5162. (9362)
[14/10/02@07:26:47.718-0400] P-010988 T-NS1 1 NS NSPlumbing An exception occured. Error message: socket closed. (9411)
[14/10/02@07:26:48.718-0400] P-010988 T-NS1 1 NS NSPlumbing An error occured while listening for network input requests on port 5162. (9362)
[14/10/02@07:26:48.718-0400] P-010988 T-NS1 1 NS NSPlumbing An exception occured. Error message: socket closed. (9411)
[14/10/02@07:26:49.718-0400] P-010988 T-NS1 1 NS NSPlumbing An error occured while listening for network input requests on port 5162. (9362)
[14/10/02@07:26:49.718-0400] P-010988 T-NS1 1 NS NSPlumbing An exception occured. Error message: socket closed. (9411)
[14/10/02@07:26:50.725-0400] P-010988 T-NS1 1 NS NSPlumbing Multiple I/O exception errors on port 5162, the NameServer is shutting down. (9412)
======================================================================
[14/10/02@07:58:26.918-0400] P-008788 T-main 1 --- --- C:\Intergy\logfiles\NS1.ns.log opened.
[14/10/02@07:58:26.922-0400] P-008788 T-main 1 --- --- Logging level set to = 1
[14/10/02@07:58:26.922-0400] P-008788 T-main 1 --- --- Log entry types activated: NSPlumbing,
======================================================================
[14/10/05@11:28:04.789-0400] P-008996 T-main 1 --- --- C:\Intergy\logfiles\NS1.ns.log opened.
[14/10/05@11:28:04.792-0400] P-008996 T-main 1 --- --- Logging level set to = 1
[14/10/05@11:28:04.792-0400] P-008996 T-main 1 --- --- Log entry types activated: NSPlumbing,
 
Top