MFGPro / Progress no new connections after running normally for 12-14 hours

ddegroot

Member
Hi Guys,

We ran into a problem after moving to a new machine. We are running Tru64 5.1B with an old version of Progress 7.3E15. We need this old version to support running MFGPro 8.5E.

Machine:
--------
Digital Alpha ES40 (4 CPU, 8 Gb)
Tru64 5.1B
Progress 7.3E15
MFGPro 8.5E
Attachments: prostats files of the last 4 hours:
View attachment prostats_04-24-08_1910.prod.txt / View attachment prostats_04-24-08_2010.prod.txt
View attachment prostats_04-24-08_2110.prod.txt / View attachment prostats_04-24-08_2210.prod.txt /
View attachment prostats_04-24-08_2310.prod.txt

The problem is that we are running fine for about 12-14 hours and then the prod (prod.pf) database does not allow any more login's or logout's. Even connecting new APWS, Promon etc connections will not work. The session just does not start. The connection attempt is not even written to the prod.lg file. There is no filesystem activity or cpu activity at this moment. OS is fully reactive and now errors in any OS log files.

We have been trying to solve this problem over the last couple of days, and have nog been able to find any real problem that might cause it. /var/log/messages, /var/log/kern.log /var/log/deamon.log all don't mention anything particular that might be related. All other databases running on the server (prod_gl.pf, lnprod.pf normally connected to the same session for every application logon) work just fine and are still able to connect new users.

We use self serving clients for 90% of the connections, only a few users use GUI. We have a maximum of 88 clients connections allowed and normally about 60 users online.

7.3E15 only allows 1024 blocksize

prod db:
--------
filesystem: 20 Gb
size: 9Gb in 100 Mb extends
highwaterlevel: 48%

bi file prod.db
-------------
filesystem: 20 Gb
size: 700 Mb in 100 Mb extends
biclustersize = 4096
biblocksize=8.

# Progress parameter file for Production DB
#
# Client <-> Server parameters:
-H mfgpro2
-N tcp
-S prod

-n 88 # Maximum aantal gebruikers
-Mn 4
-Ma 12
-Mf 90

-L 100000
-B 400000

-spin 25000
-bibufs 50
-napmax 5000
-q

(We would like to increase the -B parameter later to about 1200000 but this needs to be solved first.)

Checkpoint Information
----------------------
Code:
Ckpt                         ------ Database Writes ------
 No. Time      Len   Dirty   CPT Q    Scan   APW Q Flushes
  77 23:14:58  127    1888    1887     255       0       0  100   123    1888  <-- End Batch. Server again allows now new logins.
  76 23:12:49  129    1862    1861       0       0       0  100   125    1862
  75 23:10:43  126    1849    1848       0       0       0  100   122    1849
  74 23:08:57  106    1834    1833       0       0       0  100   102    1834
  73 23:07:46   71    1286    1285       0       0       0  100    71    1286
  73 23:07:46   71    1286    1285       0       0       0  100    71    1286
  72 23:06:57   49    2311    2310       0       0       0  100    48    2311
  71 23:05:36   81    2352    2351      11       0       0  100    81    2352
  70 23:03:28  128    2273    2272      20       0       0  100   123    2273
  69 23:01:12  136    2178    2176       5       0       0  100   131    2178
  68 22:58:58  134    2224    2222      67       0       0  100   130    2224
  67 22:56:39  139    2147    2146      65       0       0  100   134    2147
  66 22:54:27  132    2139    2138      60       0       0  100   127    2139
  65 22:52:13  134    2803    2802      34       0       0  100   129    2803
  64 22:49:58  135    1796    1795      42       0       0  100   131    1796
  63 22:48:39   79    1321    1245      18       0       0  100    76    1321
  62 22:47:56   43    1559    1483       0       0      75  100    43    1484
  61 22:46:50   66    1379    1378       0       0       0  100    65    1379
  60 22:46:00   50    1985    1984       0       0       0  100    48    1985
  59 22:44:49   71    2122    2121      29       0       0  100    69    2122
  48 22:28:30   86    2397    2396     105       0       0
  57 22:42:19   62    1908    1907       0       0       0  100    60    1908
  56 22:41:08   71    2094    2093       0       0       0  100    69    2094
  55 22:39:28  100    2154    2153       0       0       0  100    98    2154
  54 22:38:04   84    2834    2833      33       0       0  100    82    2834
  53 22:36:16  108    2435    2433      49       0       0  100   105    2434
  52 22:34:47   89    1483    1473      32       0       0  100    86    1481
  51 22:34:03   44    3455    3437      17       0       0  100    44    3440
  50 22:32:02  121    2645    2643       7       0       0
  49 22:29:56  126    2060    2059      38       0       0
  48 22:28:30   86    2397    2396     105       0       0  100    81    2397
  47 22:26:49  101    2356    2345     145       0       0  100    99    2355
  46 22:24:49  120    2095    2060     196       0       0  100   119    2091
  45 22:22:37  132    1818    1748     296       0       0  100   129    1782
  44 22:20:13  144    2926    2923     130       0       0  100   140    2924
  43 22:19:09   64    2341    2334      35       0       0  100    57    2341
  42 22:18:02   67    2885    2876      22       0       0  100    64    2883
  41 22:17:01   61    3479    3421     128       0       0  100    59    3423
  40 22:15:53   68    1774    1758      66       0       0  100    66    1764
  39 22:14:45   68    1796    1783      64       0       0  100    66    1790
  38 22:13:50   55    3258    3256       8       0       0  100    53    3257
  37 22:12:42   68    2727    2726      12       0       0  100    66    2727
  36 22:11:41   61    6061    6060       4       0       0  100    60    6061
  35 22:10:44   57    7236    7235      77       0       0  100    55    7236
  34 22:09:59   45    6764    6763      51       0       0  100    44    6764
  33 22:09:18   41    7483    7482      24       0       0  100    41    7483
  32 22:08:35   43    8795    8794       0       0       0  100    43    8795
  31 22:07:53   42    9684    9683      43       0       0  100    42    9684
  30 22:07:12   41     625     526      42       0       0  100    40     625
  29 22:06:05   67    1138     999      55       0      98  100    67    1040  <-- Start Batch MRP
  28 22:05:03   62    2971    2925       0       0       0  100    62    2971
  27 22:04:07   56    2598    2552       0       0      45  100    56    2553
  26 22:03:07   60    3504    3503      21       0       0  100    60    3504
  25 22:02:01   66    1786    1785       0       0       0  100    66    1786
  24 22:01:01   60    1758    1757       0       0       0  100    57    1758
  23 21:20:03 2458     164       0     240       9       0  100     1       0  <-- Batch started 22:00
  22 20:21:27 3516     673     671     542       0       0  100   568     672
  21 17:10:06 11481    247     235    2179       0       0  100   238     242
  20 15:57:18 4368    2825    2823    2999       0       0  100  2377    2824
  19 15:35:04 1334    3617    3615    1103       0       0  100  1186    3616
  18 15:08:44 1580    3699    3684     328       0       0  100  1520    3698
  17 14:57:24  680    1972    1957       0       0      13  100   680    1958 
  16 14:44:58  746    1902    1897       5       0       0  100   744    1901
  15 13:58:12 2806     834     825    3050       0       0  100   276     826
  14 13:01:35 3397    3455    3453    3672       0       0  100  1147    3454 
  13 12:14:02 2853    2163    2161    1045       0       0  100   698    2162
  12 11:10:07 3835     898     896    2605       0       0  100   150     897
  11 10:12:35 3452    4076    4073    4128       0       0  100   681    4075
  10 09:56:49  946    4583    4580      45       0       0  100   749    4582
   9 09:40:55  954    4850    4848      74       0       0  100   741    4849
   8 09:25:37  918    3064    3061     465       0       0  100   437    3063
   7 08:51:58 2019    1918    1916    1586       0       0
   6 08:35:33  985    3927    2473     379    3874       0
   5 08:17:55 1058    2879    2878     409       0       0
   7 08:51:58 2019    1918    1916    1586       0       0
   6 08:35:33  985    3927    2473     379    3874       0
   5 08:17:55 1058    2879    2878     409       0       0
   4 07:59:34 1101    2540    2539     539       0       0 <== Rebooted machine at 7:00
View attachment CheckpointTable.txt

If you guys need any more information to give us a hand in solving this problem just ask. We should be able to provide it.

Thanks in advance

Diederik de Groot :confused:
 

ddegroot

Member
Side note:

Could NIS updates be causing these problems. I have seen that NIS sometimes takes up to 30 seconds - 1 minute to complete the username/password updates from our server.

Can any one tell me if user login/logout is handled sequentially by Progress 7.3E15 and that this might cause the problem with users being unable to login or out because one username can't find his UID/GID anymore, and won't recheck ?

Does anyone have experience with running Progress and NIS on the same machine ?

Diederik
 

ddegroot

Member
Hi Guys,

We have not observed the same problem for the last 24 hours after switching of NIS and move back to /etc/passwd authentication. I might be on to something here.

Does anyone experience similar problems with NIS and Progress ?

I would still like some kind of single sign on though. ASU is out of the option, LDAPCD doesn't seem to work, because Password sync won't compile, Kerberos would need KTelnet clients which are not available in the version of netterm we use. Is there anything against running NIS in the background and not authenticating against it (no +: at the end of the /etc/passwd file) and upditing /etc/passwd using a script every hour (, running at nice -n -20 ) ? Any other options.

Diederik
 
Top