Server Or System Has No More Resources(748).

Jack@dba

Member
Hi All,

Today we faced issue with our mfg/pro application.One of mfg/pro job got failed during that time database up and running fine but users not able to connect to Application.When i try to connect to mfg/pro application also i am faced below issue.
After that we have restarted databases everything came to normal.

"Server or system has no more resources.Please contact progress technical support(748)"

When i checked in database log file i found below errors.But we not found any cpu or memory utilization are high during that time all are normal.

I checked in protrace file not able to find anything same has been attached for your refrence.

What exactly the issue why user are not able to connect to application?
How to monitor Resource count?


Database version : 9.1E
OS version : Aix 5.3 outdated version
DB size is 80GB
Mfg/pro 8.3

-B 70000 # Number of Blocks in database buffer
-bibufs 200 # Number of before-image buffers
-aibufs 300 # Number of after-image buffers
#-Mi 2 # Min processes on a client server
-Ma 15 # Max number of REMOTE clients per db server
-Mn 35 # Max number of REMOTE client servers
-n 1000 # Max number users.
-Mxs 32768 # Shared memory overflow size (override)
-spin 12000 # Number of spin lock retries
-L 1024000 # Lock Table entries

We have checked firewall settings for keep-alive timeout values already set to default.

*p-10001*:/home/pgresdba: no -a | grep tcp_keep
tcp_keepcnt = 8
tcp_keepidle = 14400
tcp_keepinit = 150
tcp_keepintvl = 150


8:58:34 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
18:58:34 BROKER 0: Error reading socket=5 ret=-1 errno=22. (795)
18:58:34 BROKER 0: Error reading socket=5 ret=-1 errno=22. (795)
19:01:43 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:02:17 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:02:17 BROKER 0: Error reading socket=20 ret=-1 errno=22. (795)
19:02:21 BROKER 0: Error reading socket=25 ret=-1 errno=22. (795)
19:02:36 BROKER 0: Error reading socket=5 ret=-1 errno=22. (795)
19:03:06 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:03:36 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:04:18 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:04:48 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:05:18 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:05:18 BROKER 0: Server's received count 1 does not equal client(1)'s send count 1414744096. (1055)
19:05:48 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:05:48 BROKER 0: Error reading socket=5 ret=-1 errno=22. (795)
19:06:18 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:06:48 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:07:18 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:07:48 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:07:59 SRV 2: Connection timed out on socket=19 for usernum 1, attempt disconnect. (1280)
19:07:59 SRV 2: Error reading socket=19 ret=-1 errno=22. (795)
19:07:59 SRV 2: Error reading socket=19 ret=-1 errno=22. (795)
19:08:18 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:08:18 BROKER 0: Error reading socket=5 ret=-1 errno=22. (795)
19:08:48 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:09:18 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:09:48 BROKER 0: Connection timed out on socket=5 for usernum 1, attempt disconnect. (1280)
19:10:20 SRV 2: Connection timed out on socket=19 for usernum 1, attempt disconnect. (1280)
19:10:33 BROKER 0: Error reading socket=5 ret=-1 errno=22. (795)
19:10:55 SRV 2: Connection timed out on socket=22 for usernum 4, attempt disconnect. (1280)
19:10:55 SRV 2: Error reading socket=27 ret=-1 errno=22. (795)
19:10:55 SRV 2: Error reading socket=19 ret=-1 errno=22. (795)
 

Attachments

  • mfgpro.docx
    26.3 KB · Views: 3

Rob Fitzpatrick

ProgressTalk.com Sponsor
A 748 isn't about CPU or memory utilization. It means that no more clients can connect. Typically it is because -n is too low. But it can also be because some of the client/server broker parameters are set too low. And it can also happen when the parameter values are correct but your Progress version has a bug that results in the database's client count not being decremented when clients disconnect.

You can look at dbname.lic in the database directory. It should contain totals of connected clients (interactive, batch, and total; current, max, and min) in each hour interval. Ordinarily you should see a pattern in the numbers. For example they might reach a low point overnight, rise to a peak in late morning, and drop off steadily starting in mid afternoon. If you have a bug, they might still follow a similar pattern but migrate upwards steadily over time until they reach the value of -n.

There was such a bug in 10.2B01. Every online backup caused the user count to increase by 1 but not decrease by 1 when the backup finished. Eventually, after enough backups, all the free _connect slots would be used up by backup clients and subsequent connection requests would be refused with a 748. Until the bug was fixed in 10.2B02, the only recourse was to restart the db.

You can see the number of connected processes in promon screen 1 1.
 

TomBascom

Curmudgeon
I notice that you are acknowledging that AIX 5.3 is outdated.

You might like to know that Progress 9.1E is laughably ancient, obsolete, unsupported and criminally negligent to keep in production.

As of this post (April 2017) Progress OpenEdge 11.7 is the current release.
 

Jack@dba

Member
Thanks Rob for quick update.

But we have good values for startup parameters

-B 70000 # Number of Blocks in database buffer
-bibufs 200 # Number of before-image buffers
-aibufs 300 # Number of after-image buffers
#-Mi 2 # Min processes on a client server
-Ma 15 # Max number of REMOTE clients per db server
-Mn 35 # Max number of REMOTE client servers
-n 1000 # Max number users.
-Mxs 32768 # Shared memory overflow size (override)
-spin 12000 # Number of spin lock retries
-L 1024000 # Lock Table entries

prdcrn -m3 -Mi 3 -Ma 5 -Mpb 4 -S prdcrnoi -ServerType SQL

uptime
11:53AM up 5 days, 9:51, 247 users, load average: 2.12, 3.03, 4.53

04/27/17 Activity: Performance Indicators
11:56:22 04/27/17 02:00 to 04/27/17 11:55 (9 hrs 55 min)

Total Per Min Per Sec Per Tx

Commits 2459375 4133 68.89 1.00
Undos 2137 3 0.05 0.00
Index operations 3660240 6152 102.53 1.48
Record operations 614569038 1032900 17215.00 5.39
Total o/s i/o 35317000 59340 989.00 14.36
Total o/s reads 34808305 58500 975.00 14.15
Total o/s writes 508695 854 14.24 0.20
Background o/s writes 453391 762 12.70 0.18
Partial log writes 130117 218 3.64 0.05
Database extends 0 0 0.00 0.00
Total waits 185007 311 5.18 0.07
Lock waits 24 0 0.00 0.00
Resource waits 184983 311 5.18 0.07
Latch timeouts 198932 334 5.57 0.08

Buffer pool hit rate: 2 %

For next time if we faced same issue what are the things i need to check before starting database.
In 9.1e version Promon dont have all option as latest version.

I attached Remote connection deatils please check and let me know any parametrs changes need to be done
 

Attachments

  • mfgpro.docx
    14.5 KB · Views: 5

Rob Fitzpatrick

ProgressTalk.com Sponsor
I assume servers 2, 6, 8, and 9 are SQL servers as none of the users in your attached document were connected to them and the SQL broker has -Mpb 4. Are the user connection details in this document from the exact time of the 748 error? Because they show 31 servers with a total of 180 connected users, and the highest count for any server is server 1 with 7 clients. Given the parameters you have provided I don't see a reason for a client to be refused with a 748 error, if as you say the parameters are high enough for the number and type of connecting clients.

The list you provided is from promon, remote clients. This does not show you all connected processes, e.g. utilities, helpers, servers, brokers, and, importantly, self-service clients. When a remote client connects, it is given the highest available user number, i.e. the last available _connect slot. Clients connecting later (assuming no one disconnects in the interim) get successively lower user numbers. Shared memory (self-service) clients, by contrast, get the lowest number available and go up from there. So there is a point at which the self-service numbers end and the REMC numbers begin. But unlike remote clients, there is no parameter other than -n that limits the number of possible self-service clients. And unlike servers, where their user numbers are reserved based on -Mn, the user numbers for remote clients are not reserved based on -Mpb and -Ma. So if shared-memory clients use up all of the available _connect slots then no more remote clients will be able to connect even if none of the servers have reached their maximum client count. You have two brokers, primary and secondary, and -Mn 35, so I'd guess your shared-memory clients (including helpers and utilities) start at user number 36. If they use up numbers 36 through 848 then that would explain the 748 error. The next time you get the error, check promon 1 1 to get a full picture of every process that is connected.

Also, this is old software. The other possibility is that you are dealing with a bug.
 
Top