R
Rob Fitzpatrick
Guest
In this test environment are there other clients connected apart from the batch client you mentioned? One possibility that is consistent with symptoms of a slow application and no apparent system-level bottlenecks is record lock contention. Example: Client A obtains an exclusive lock on record 1 in table X. Client B (your batch client) attempts to obtain a lock on that same record and can't; its request is queued. Depending on how the batch client's code was written (e.g. whether it specifies NO-WAIT on the query), client B may block and do nothing until one of two things happens. Either client A releases the lock and client B obtains it and continues processing, or client A retains the lock until client B's lock wait timeout expires (30 minutes by default). I think this is a pretty unlikely scenario. If this was your issue you would expect to see similar contention in prod. If anything, this problem would be worse in prod than in test due to (probably) greater user count and activity. But it's a possibility. A client in that state would show up under "blocked clients" in promon or ProTop. You would also see record waits for that client in promon R&D 3 3 (lock requests by user). If there is a lock wait timeout you would see an (8812) error in the client's client log and in the database log. Another possibility is that the client is blocking on a network I/O. I have seen ABL client performance nosedive when it is attempting reads or writes on an unresponsive or unreliable NFS share (or disk).
Continue reading...
Continue reading...