Active connections grow and grow and grow...

islandjoe

New Member
I wonder if anyone has seen this one before:

The number of 'active' connections to our database(s) keeps going up, even though the number of actual connections remains quite low.
I suspect that somehow we have connections to the database that are not being disconnected. The number of 'active' connections grows by about 1 per day, until after about 90 to a 100 days we hit some sort of limit and the SQL broker starts refusing connections.

Only fix is to reboot the database/server, not ideal. I could adjust startup parameters to increase the limits, but I'd rather find and fix the problem.

Initially I suspected the JDBC connections, but we have sites that are pure ABL connections only, and they have the same problem.
One of the pure ABL sites currently records the number of active connections as being at 150(!) but there are only 10 active ABL connections.
Can't see the 'phantom' connections in promon or in VST tables, no pending connections on servers, nada.

Have done analysis using .LIC file and VST tables (_Connect, _Server, etc) and if I plot 'active connections' over time, I get a very distinctive 'saw tooth' graph for each if our sites, with each 'tooth' corresponding to a ~3 month period.

These are all OpenEdge 10.2B databases, running on Windows servers. Watchdog enabled on all of them, doesn't seem to do anything.
We upgraded to v10 recently, but suspect problem was present in v9 as well just masked.

Anyone have any idea?

Is there anywhere I can set a timeout for connections?
 
The OpenEdge database relies on the TCP keep alive timeout setting which is, AFAIK, configurable on the operating system level. If I am correct the default on most systems is 30 minutes. That means that it can take up to the keep alive timeout until the database recognizes a "disappeared" remote client. Probably your clients do not disconnect gracefully from the database.

Heavy Regards, RealHeavyDude.
 
30 minute timeout...yet connections accumulate over a 3 month period...

I suspect I've only got two options:
Do a scheduled reboot every month or so on each server,
or
try and identify the offending application/task/process by process of elimination :(
 
Does the database ever get restarted ? I would restart the db rather than restart the server, but you are on Windows.
Do you use probkup to backup the database ? See if the active connection increases after probkup kicks off.
What service pack are you using ? 10.2B on Windows has some issues addressed in SP04.
 
What are you looking at that gives you a count of "active" connections?

What are you looking at that gives you a count of "actual" connections?

Growing by 1 per day suggests that some sort of daily (or nightly) process is involved.

Have you looked for "phantoms" at the OS level? For instance, eoes taskmgr show an inappropriate number of Progress related processes running?
 
If you are on 10.2B prior to SP02, there is a bug in online backup that it takes a connection but does not relinquish it when it finishes. Eventually you have to restart the DB, or start it with a very high -n to postpone the inevitable restart; neither option is ideal, obviously.

It is "fixed" in SP02, at least to the extent that the restart is no longer required, but the .lic stats will still show continual growth (and be pretty much useless as a result). My Linux box isn't on SP04 yet so I'm not sure if that remaining issue has been addressed yet in the recent service packs.

According to my notes this is KB article P179096, bug OE00203062.

The original bug is in P165511: "Probkup online continually reduces the number of available remote user connection slots, causing error 748 on remote clients."
 
Thanks for the help guys, I'll see if service pack 4 fixes the problem.

That kb article pretty much sums up the problem, except for the fact that only SQL connections are refused.

I had already suspected the back-up process was the cause, but I had not done any further investigation...

To get the counts for the 'active' and 'current' connections I was looking at the _License table and the .lic file and I wrote a little script that
displayed the data from the _Server, _User and _Connections tables. Fairly similar to what ProTop does.
I only installed ProTop (.NET) after I wrote that script and I don't think I would've spotted the growing number of connections if I hadn't
done my own analysis first. ProTop gives me different numbers for the total connections and all that.
 
SP04 didn't fix it, we're going to have to schedule monthly reboot. And it is definitely the backups that are the cause.
 
What is your actual backup command line?

Is it possible that the backup is not completing? Could it, for instance, be waiting for the "next extent"?

Have you tried restoring your backups to make sure that they are good? (Any time there is weirdness with backups it is good to validate them!)

Does the .lg file say that the backup finished? Is there an extra _dbutil.exe process running for each backup attempt?
 
Looks like a bug in the backup according to that Progress kb...

Anyway:

probkup online [database.db] [backupfile] >> [logfile.txt]

And the backup completes just fine according to log...haven't tried restoring the backup yet, but we've never had issues with restoring our databases.

Backup output:

OpenEdge Release 10.2B04 as of Tue Mar 8 19:27:59 EST 2011


72828 active blocks out of 2048011 blocks in D:\DATA\psc will be dumped. (6686)
256 BI blocks will be dumped. (6688)
Backup requires an estimated 287.1 MBytes of media. (9285)
Restore would require an estimated 73491 db blocks using 71.6K of media. (9286)
Backed up 73084 db blocks in 00:00:27
Wrote a total of 2162 backup blocks using 287.1 MBytes of media. (13625)


Backup complete. (3740)

(output in .lg file is pretty much the same)

Every time I run that backup the 'Active' connections to the database goes up by 1 and stays there. Any other connection to the database makes the 'active' connections go up and down as it should, but not the backup. Can't see any _dbutil.exe processes when I run the backup, just a single instance of _mprshut.exe
 
It sounds like you are seeing the same behaviour that I am. And yes, if you look in _License or db.lic your counts will keep increasing steadily with each probkup online, but that doesn't mean you have to bounce your database, unless you care about the _License counts for some other reason.

In SP01 the bug really did cause 748 errors and keep clients from connecting. In SP02 and later it just *looks* like that will happen, but it doesn't. My active connections count is far above my -n and I can still connect. At this point it's just an annoyance because it means your _License data is garbage. That said, I do wish PSC would fix it.
 
Hi islandjoe,

Just FYI, bug OE00203062 is marked as fixed in 10.2B SP05:

"Each time backup online is run _License._Lic-ActiveConns and _License._Lic-CurrConns are incremented and never decremented until database is restarted."

Cheers,

Rob
 
Back
Top