All webspeed broker agents locked suddenly

Mike

Moderator
Today we had a situation where all webspeed broker has been locked :-

OpenEdge Release 11.7.9 as of Fri Dec 4 18:16:01 EST 2020





Connecting to Progress AdminServer using rmi://localhost:20931/Chimera (8280)
Searching for csprod_ws (8288)
Connecting to csprod_ws (8276)



Broker Name : csprod_ws
Operating Mode : Stateless
Broker Status : ACTIVE
Broker Port : 49500
Broker PID : 8887
Active Agents : 5
Busy Agents : 0
Locked Agents : 5
Available Agents : 0
Active Clients (now, peak) : (1, 8)
Client Queue Depth (cur, max) : (1, 8)
Total Requests : 248839
Rq Wait (max, avg) : (19302 ms, 43 ms)
Rq Duration (max, avg) : (19303 ms, 43 ms)



PID State Port nRq nRcvd nSent Started Last Change
31660 LOCKED 03204 000000 000000 000000 Jul 30, 2025 08:23 Jul 30, 2025 08:23
31673 LOCKED 03205 000000 000000 000000 Jul 30, 2025 08:23 Jul 30, 2025 08:23
31687 LOCKED 03206 000000 000000 000000 Jul 30, 2025 08:23 Jul 30, 2025 08:23
31697 LOCKED 03202 000000 000000 000000 Jul 30, 2025 08:23 Jul 30, 2025 08:23
31707 LOCKED 03203 000000 000000 000000 Jul 30, 2025 08:23 Jul 30, 2025 08:23

After that we re-started the web speed broker and all was working fine with available port.We need RCA for this:-

webserver log file:-


[25/07/30@08:22:44.028-0500] P-014195 T-014195 1 WS -- (Procedure: 'changeDomain lib/lib_sys_map1.p' Line:6026) global_domain= Excel
[25/07/30@08:23:21.388-0500] P-014192 T-014192 1 WS -- SYSTEM ERROR: Memory violation. (49)
[25/07/30@08:23:21.388-0500] P-014192 T-014192 1 WS -- ** Save file named core for analysis by Progress Software Corporation. (439)
[25/07/30@08:23:25.710-0500] P-031660 T-031660 1 WS -- Logging level greater than 1 must be specified either for DB.Connects or for all types by using -logginglevel. (11072)
[25/07/30@08:23:25.710-0500] P-031660 T-031660 1 WS -- Logging level set to = 1
[25/07/30@08:23:25.710-0500] P-031660 T-031660 1 WS -- No log entry types are activated
[25/07/30@08:23:25.710-0500] P-031660 T-031660 1 WS -- WTA server initializing. (8835)
[25/07/30@08:23:25.760-0500] P-031660 T-031660 1 WS -- (Procedure: 'readini web/objects/web-disp.p' Line:844) QAD CSS: INI found: /apps/qad/css/550/prod/qadcss.ini . Now loading environment.
[25/07/30@08:23:25.790-0500] P-031660 T-031660 1 WS -- (Procedure: 'getConnected web/objects/web-disp.p' Line:979) QAD CSS: There are 6 databases connected to qadcss
[25/07/30@08:23:25.790-0500] P-031660 T-031660 1 WS -- (Procedure: 'getConnected web/objects/web-disp.p' Line:980) QAD CSS: The qadcss database has a local connection
[25/07/30@08:23:25.906-0500] P-031660 T-031660 1 WS -- (Procedure: 'loadSupers lib/lib_ext_superex1.p' Line:7300) QAD CSS: Start adding Supers
[25/07/30@08:23:25.907-0500] P-031660 T-031660 1 WS -- (Procedure: 'loadSupers lib/lib_ext_superex1.p' Line:7348) QAD CSS: Now adding lib/lib_sys_exmanager.p to the Extended Super Layer.
[25/07/30@08:23:25.984-0500] P-031660 T-031660 1 WS -- (Procedure: 'loadSupers lib/lib_ext_superex1.p' Line:7366) QAD CSS ERROR: Agent License Exceeded. New Agent rejected!
[25/07/30@08:23:25.986-0500] P-014195 T-014195 1 WS -- SYSTEM ERROR: Memory violation. (49)

Db log file also showing this while all agents were locked: -

2025/07/30@08:23:25.984-0500] P-31660 T-139865550917888 I WSAGENT70: (-----) Received RECONNECT from WTB
[2025/07/30@13:23:25.000+0000] P-14195 T-139972119331072 I WSAGENT47: (49) SYSTEM ERROR: Memory violation.
[2025/07/30@13:23:25.000+0000] P-14195 T-139972119331072 I WSAGENT47: (439) ** Save file named core for analysis by Progress Software Corporation.
[2025/07/30@08:23:29.682-0500] P-31673 T-139820204527872 I WSAGENT85: (452) Login by mfg on batch.
[2025/07/30@08:23:29.685-0500] P-31673 T-139820204527872 I WSAGENT85: (7129) Usr 85 set name to mfg.
[2025/07/30@08:23:29.916-0500] P-31673 T-139820204527872 I WSAGENT85: (-----) Received RECONNECT from WTB
[2025/07/30@13:23:29.000+0000] P-14201 T-140458810736896 I WSAGENT48: (49) SYSTEM ERROR: Memory violation.
[2025/07/30@13:23:29.000+0000] P-14201 T-140458810736896 I WSAGENT48: (439) ** Save file named core for analysis by Progress Software Corporation.
[2025/07/30@08:23:34.027-0500] P-31687 T-140117605458176 I WSAGENT86: (452) Login by mfg on batch.
[2025/07/30@08:23:34.030-0500] P-31687 T-140117605458176 I WSAGENT86: (7129) Usr 86 set name to mfg.
[2025/07/30@08:23:34.232-0500] P-31687 T-140117605458176 I WSAGENT86: (-----) Received RECONNECT from WTB
[2025/07/30@13:23:34.000+0000] P-14181 T-139726122832128 I WSAGENT35: (49) SYSTEM ERROR: Memory violation.
[2025/07/30@13:23:34.000+0000] P-14181 T-139726122832128 I WSAGENT35: (439) ** Save file named core for analysis by Progress Software Corporation.
[2025/07/30@08:23:37.874-0500] P-31697 T-140590362644736 I WSAGENT87: (452) Login by mfg on batch.
[2025/07/30@08:23:37.878-0500] P-31697 T-140590362644736 I WSAGENT87: (7129) Usr 87 set name to mfg.
[2025/07/30@08:23:38.069-0500] P-31697 T-140590362644736 I WSAGENT87: (-----) Received RECONNECT from WTB
[2025/07/30@13:23:38.000+0000] P-14184 T-140324535468288 I WSAGENT42: (49) SYSTEM ERROR: Memory violation.
[2025/07/30@13:23:38.000+0000] P-14184 T-140324535468288 I WSAGENT42: (439) ** Save file named core for analysis by Progress Software Corporation.
[


Can you please Analys(have a look) and find what was the root cause please and also during this time java update was going on.



Thanks and regards
Mike
 
All 5 old WS agents (PIDs 14181,14184,14192,14195,14201) died with memory violation between 08:23:21 and 08:23:38
All 5 new WS agents (PIDs 31660,31673,31687,31697,31707) were LOCKED.

The reason must be common to them.

> also during this time java update was going on.

Just curious what the protraces of the old agents say.
 
Hi George,

Thanks for quik response , Please fine the below protrace file generated during issue :-
I just want to know how it happened that suddenly all agents locked and reason:-

PROGRESS stack trace as of Wed Jul 30 08:23:38 2025
Progress OpenEdge Release 11.7 build 2026 SP09 on Linux towdfr 4.19.1-27-amd64 #1 SMP Debian 4.19.416-1 (2024-06-25)

Command line arguments are
/apps/progress/oe/117/dlc/bin/_progres -web -logginglevel 1 -logfile /apps/qad/css/550/prod/logs/cssprod_ws.server.070024.log -ubpid 8887 -wtbhostnam
e 10.113.158.180 -wtbport 42829 -wtaminport 3202 -wtamaxport 3502 -wtbname cssprod_ws -wtainstance 1083 -ubpropfile /apps/progress/oe/117/dlc/propert
ies/ubroker.properties -logname cssprod_ws -logthreshold 50000000 -numlogfiles 5 -logentrytypes DB.Connects -ipver IPv4 -p web/objects/web-disp.p -we
blogerror -pf /apps/qad/css/550/prod/qadcss.pf

Startup parameters:
-pf /apps/progress/oe/117/dlc/startup.pf,-cpinternal ISO8859-1,-cpstream ISO8859-1,-cpcoll Basic,-cpcase Basic,-d dmy,-numsep 44,-numdec 46,(end .pf)
,-web,-logginglevel 1,-logfile /apps/qad/css/550/prod/logs/cssprod_ws.server.070024.log,-ubpid 8887,-wtbhostname 10.113.158.180,-wtbport 42829,-wtami
nport 3202,-wtamaxport 3502,-wtbname cssprod_ws,-wtainstance 1083,-ubpropfile /apps/progress/oe/117/dlc/properties/ubroker.properties,-logname csspro
d_ws,-logthreshold 50000000,-numlogfiles 5,-logentrytypes DB.Connects,-ipver IPv4,-p web/objects/web-disp.p,-weblogerror,-pf /apps/qad/css/550/prod/q
adcss.pf,-TB 16,-TM 16,-rereadnolock,-rand 2,-T ./temp,-B 5000,-h 20,-c 30,-D 250,-nb 200,-s 256,-Bt 100,-param apimode=true,(end .pf)

Is that happened because of Java version updated or another reason why they died suddenly?And why this memory violation and its consequences and how to resolve this?

Thanks Mike

All 5 old WS agents (PIDs 14181,14184,14192,14195,14201) died with memory violation between 08:23:21 and 08:23:38
All 5 new WS agents (PIDs 31660,31673,31687,31697,31707) were LOCKED.

The reason must be common to them.

> also during this time java update was going on.

Just curious what the protraces of the old agents say.
e
 
also during this time java update was going on
This seems relevant.

You should not try to update Java while WebSpeed is running, any more than you should try to upgrade OpenEdge while there are running OpenEdge processes. The WebSpeed broker is Java-based.
 
Thanks for your response, Rob.
Sorry, I was wrong the event that locked the servers occurred between 8:20 and 8:25 AM (I assume local time – Central as per George Analys). The upgrade to the java components in /opt/java/ occurred between 10:27 and 10:45 AM local time (Central), two hours AFTER the servers had already locked up. Can you please look up that what was the real root cause.? @ Experts, please Analyse this its bit confusion.
This seems relevant.

You should not try to update Java while WebSpeed is running, any more than you should try to upgrade OpenEdge while there are running OpenEdge processes. The WebSpeed broker is Java-based.
 
Error 49 is a Progress bug.

You are on 11.7.9, which is four and a half years old; the current update is 11.7.22. So you are missing 13 updates worth of bug fixes.
 
Error 49 is a Progress bug.

You are on 11.7.9, which is four and a half years old; the current update is 11.7.22. So you are missing 13 updates worth of bug fixes.
Hi Rob,

Thanks for your reply, But still need a root cause .Please help

Thanks Mike
 
To find the root cause ones need:
1. to gather as much information as possible;
2. to have a qualified support team who is able to analyze the huge volume of the collected information;
3. to be lucky to get the answers after the minimal number of the incidents or to be patient to continue an investigation.

The default logs are the important sources of information but they are not enough. Small fragments of logs are even more insufficient.
Community would not able to substitute the professional services.
It’s extremely hard to resolve the memory violation errors after just a single incident. Especially when you use an old version of software. The incident tells you: you need an update. So the root cause of the next incidents could be ignoring of the advice.
 
To find the root cause ones need:
1. to gather as much information as possible;
2. to have a qualified support team who is able to analyze the huge volume of the collected information;
3. to be lucky to get the answers after the minimal number of the incidents or to be patient to continue an investigation.

The default logs are the important sources of information but they are not enough. Small fragments of logs are even more insufficient.
Community would not able to substitute the professional services.
It’s extremely hard to resolve the memory violation errors after just a single incident. Especially when you use an old version of software. The incident tells you: you need an update. So the root cause of the next incidents could be ignoring of the advice.
HI George ,

Thanks for your response and valuable advice. I apologies for any inconvenienced caused.

Thanks Mike
 
Hi George,

PFA attached logs. Please help i will be grateful to if you could help me to identify the root cause RCA as based in your wise experiece.
 

Attachments

Last edited:
Back
Top