PASOE resources not getting released

Tarby777

Member
Hi all,

I have a PASOE endpoint in v12.2.13 / Windows Server 2019 Standard that is running the web transport. I added a file upload capability to a WebSpeed screen in this endpoint, using a JavaScript XMLHttpRequest. The files are uploaded successfully but after a couple of hours all subsequent calls to the endpoint - not just the upload function - start to return a 500 error. It seems the upload is definitely to blame, as I am the only one using this endpoint and I'm not doing anything else. While troubleshooting I have been leaving it alone after uploading the files, then coming back in a couple of hours and trying to log into the application. It's as if there's a long-running thread that is getting timed-out by something other than PASOE, and getting into a state where it can't service any new requests. Multiple uploads (= multiple XMLHttpRequests) are possible... it's coming back to it after an extended period of inactivity that is the problem.

Thinking the problem might lie with the XMLHttpRequest, I set a timeout of 10 seconds on it and I also added an abort in the 'onreadystatechange' trigger. This didn't fix it. Next up, I noticed that the endpoint didn't have idle resource checking enabled, so I set idleResourceTimeout in openedge.properties to 30 minutes and bounced the endpoint. Related settings, which I didn't change, are:

idleSessionTimeout - 30 minutes
idleAgentTimeout - 30 minutes
idleConnectionTimeout - 1 minute

I repeated my test cycle and got the same result. I was hoping that by enabling idle resource checking, I might get in ahead of whatever is messing the threads up, and shut things down cleanly. I'm out of ideas at this point - any thoughts on what I could try next?

TIA
Tarby
 
Last edited:

Stefan

Well-Known Member
Not sure if this applies to you, but beware that there is a difference between a PASOE dev(elopment) and a PASOE prod(uction) instance.

The dev instance will clean up all sorts of things whereas the prod instance will not, since it starts a singleton - see this article.
 

peterjudge

Member
The files are uploaded successfully but after a couple of hours all subsequent calls to the endpoint - not just the upload function - start to return a 500 error.

Are there any useful errors in the agent log? Or the session manager (aka dated) log?

There have been bugs in the session manager code where agents are left in a zombie state (cannot process any more requests, but don't remove themselves from service) and eventually the server stops responding when it hits the max number of agents . This happened with a db disconnect but maybe there's something related going on ... I believe the issue was fixed in 12.7 (OCTA-49133).
 

Tarby777

Member
Thanks guys. Here's what I found in the logfiles:

  • localhost-access.[date].log - shows the POST for the XmlHTTPRequest, returning a 200 status
  • catalina.[date].log - shows an error about the trust store password on startup, nothing else of note
  • localhost.[date].log - nothing
  • manager.[date].log - nothing
  • [endpoint].[date].log - shows an error about caflogging.properties on startup, nothing else of note
  • [endpoint].agent.[date].log - shows a database connection made by the POST, and no disconnect
For some reason, a database connection DOES persist long after the POST has completed. Promon shows that it isn't reading or writing anything, isn't in a transaction and holds no locks. A watchdog is running on the database but there's nothing to indicate that it tries to disconnect this user - either when the upload completes or two hours later when the endpoint becomes unreachable.

Our POSTs work like this:
  1. Call our program post.r on the PASOE server, passing in the program to run and the database to connect to.
  2. post.r connects to the required database anonymously (no -U -P), runs the specified program, disconnects the database and terminates.
Logging created by post.r shows this:

Code:
18/01/2024 09:28:32.930 No old connections
18/01/2024 09:28:32.933 Prog=SEFPhoto.p
18/01/2024 09:28:32.937 AbsProg=C:\OpenEdge\WRK\wsTest\webapps\ROOT\WEB-INF\openedge\CRM\SEFPhoto.r
18/01/2024 09:28:32.941 FormFields=pf,customer,program,file[]0,name[]0,totalFileCount
18/01/2024 09:28:32.966 Disconnecting core
18/01/2024 09:28:32.969 After disconnect, db count is 0
18/01/2024 09:28:32.972 Return=OK

Here's a snippet from the DB log, showing that we connect but do not disconnect:

Code:
2024-01-18T09:28:32.903+1100 011916 008536 2 AS-7 ROOT:w:00000026 CONN           Database core Options:  (12699)
2024-01-18T09:28:32.919+1100 011916 008536 2 AS-7 ROOT:w:00000026 CONN           Connected to database core, user number 56. (9543)
2024-01-18T09:28:56.237+1100 011916 003432 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 4, Status: 0
2024-01-18T09:29:56.249+1100 011916 010944 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 6, Status: 0
2024-01-18T09:35:39.000+1100 011916 012740 1 AS-Listener mtapsv:-:? MSAS Spawning New Worker Thread. Number: 4
2024-01-18T09:36:56.327+1100 011916 004240 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 4, Status: 0
2024-01-18T09:41:32.872+1100 011916 012740 1 AS-Listener mtapsv:-:? MSAS Spawning New Worker Thread. Number: 4
2024-01-18T09:42:56.408+1100 011916 005704 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 4, Status: 0
2024-01-18T10:05:39.330+1100 011916 012740 1 AS-Listener mtapsv:-:? MSAS Spawning New Worker Thread. Number: 4
2024-01-18T10:06:56.710+1100 011916 009852 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 4, Status: 0
2024-01-18T10:35:39.218+1100 011916 012740 1 AS-Listener mtapsv:-:? MSAS Spawning New Worker Thread. Number: 4
2024-01-18T10:36:57.077+1100 011916 008536 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 7, Status: 0
2024-01-18T11:05:39.224+1100 011916 012740 1 AS-Listener mtapsv:-:? MSAS Spawning New Worker Thread. Number: 6
2024-01-18T11:06:57.450+1100 011916 008308 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 6, Status: 0
2024-01-18T11:27:53.113+1100 011916 012740 1 AS-Listener mtapsv:-:? MSAS Spawning New Worker Thread. Number: 6
2024-01-18T11:29:57.767+1100 011916 004092 1 AS-Aux-0 mtapsv:-:? MSAS Worker Thread exiting. Number: 6, Status: 0

'No old connections' is just saying that when post.r started up, it didn't have a database connected. It runs the specified program (SEFPhoto.p), disconnects the database afterwards and terminates cleanly. So that seems to be the problem... a process that - as far as it is concerned - does its job and terminates cleanly but somehow leaves behind an idle database connection that never gets cleaned up.

I see that the process 011916 008536 (PID + thread?) that is associated with my database connection exits at 10:36:57... presumably that's the point at which things go bad. The database connection is still there, one hour after that.

Cheers,
Tarby
 
Last edited:

Rob Fitzpatrick

ProgressTalk.com Sponsor
A watchdog is running on the database but there's nothing to indicate that it tries to disconnect this user - either when the upload completes or two hours later when the endpoint becomes unreachable.
The watchdog only operates on servers and self-service connections. It only intervenes and removes a connection if it detects a variance between the processes listed in the user connection table (_Connect) and the OS process list.

For example, if _Connect shows user 123, type Self/4GL, PID 12345, but the OS shows no process with that PID, the watchdog knows that that _Connect record is stale. It will remove that entry from _Connect and clean up after the user. It will not intervene with remote clients and it will not intervene if the process in question still exists but is idle, however you define that.
 

Tarby777

Member
Well, it looks like I've cracked it... and it's an odd one.

I changed post.p so that it didn't call SEFPhoto.p - I made it just return without doing anything. This did not result in an orphaned DB connection, showing that the problem was somewhere in SEFPhoto.p.

SEFPhoto.p makes use of a utility class that I wrote - it simply calls a method in the class to get the name of the local folder where it should dump the uploaded files. I replaced the call to that class with equivalent logic in SEFPhoto.p (reading a DB record that holds the folder name) and now everything is working and I'm still not getting an orphaned DB connection, so the problem is somehow related to my utility class.

SEFPhoto.p references the utility class like this:


Code:
USING Interface.IF012-SharePoint.

IF012-SharePoint:GetSEFFolder(OUTPUT cSharepointStagingArea,
                              OUTPUT cSEFFolderStatus).

...and the relevant parts of the utility class are defined like this:

Code:
CLASS Interface.IF012-SharePoint FINAL:

METHOD PUBLIC STATIC VOID GetSEFFolder (OUTPUT pcFolder AS CHARACTER,
                                        OUTPUT pcMessage AS CHARACTER):
    
    Interface.IF012-SharePoint:GetConfigValue(INPUT "{&SharePointConfigItem}",
                                              INPUT "SEFPhotoStagingArea",
                                              OUTPUT pcFolder).

END METHOD.

METHOD PRIVATE STATIC VOID GetConfigValue (INPUT pcItem AS CHAR,
                                           INPUT pcKey AS CHAR,
                                           OUTPUT pcValue AS CHAR) :

DEFINE BUFFER Config FOR Config.
 
    FOR FIRST Config NO-LOCK
    WHERE Config.ConfigItem EQ pcItem
    AND Config.ConfigKey EQ pcKey:
    
        pcValue = Config.ConfigValue.
 
    END.

END METHOD.

END CLASS.

It looks very much like something about the utility class or the way I'm calling it is causing an orphaned DB connection to persist even after SEFPhoto.p and post.p have quit...

TIA
Tarby
 

peterjudge

Member
This sounds familiar ... Given that the "static class" - or more accurately, a class with static members - is loaded into memory once and held there for the life of the session, there's now an rcode reference to the Config table and so, I think, the DB connection is not actually fully disconnected. A workaround for this would be to have the static method call some instance (could be a class or even a simple run .p) to access the Config table and return the value to the static method.

I had thought there was a kbase on this issue but it wasn't in the first couple of hundred search results.

Are you sure that this - holding db connections - is the cause of the 500 responses?
 

Tarby777

Member
Thanks Peter. It does seem to be the cause of the 500 errors... it's a CRM app in PASOE + WebSpeed that has been running without this kind of error for years, and this problem has only started happening since the file upload feature was added. The file upload is the only part of the app that references the utility class, and the problem occurs within a couple of hours of a file upload, every time.
 
Top