Tracking down heavy disk I/O w/ Progress and SX enterprise

kblackwel

New Member
Let me first say, I am not a expert progress DBA. I do have some knowledge, but I am looking here for some direction.

Currently we run progress 9.1. Will be upgrading to 10.1 soon, but I'm first having to look at some performance issues.

We run a program called SX enterprise on top of progress. The reason I mention that is because of the table names.

We run both progress and SXe on an IBM AIX system. Most of the Progress tables are on a 2 gig fibre channel disk subsystem.

1st. users will say the system is moving slow. As an example they will show us their screen as they type in characters. Usually they'll hit the key and it'll take a second to appear on the screen.

2nd. When we notice that happening, we'll fire up topas and notice that hdisk10 will be anywhere between 80-90% busy.

On our system, hdisk10 is mounted under db_area1 on the disk subsystem.

In that directory are numerous data files

ie.
nxt_30.d#
nxt_50.d#
nxt_107.d#

Well after some research it seems that the highest activity table in that directory is

nxt_107.d1

This is from the schema file

/db_area1/nxt_107.d1 f 2000000
/db_area1/nxt_107.d2 f 2000000
/db_area1/nxt_107.d3 f 2000000
/db_area1/nxt_107.d4

From speaking with others about this problem, there are about 3 schools of thought on what size those files should be and on what disk they should be distributed on. None of which I full understand.

If anyone has any suggestions on how to better configure these files for better performance, I would appreciate it.

Thanks in advance and I can provide any additional information as needed.

Kevin
 

sdjensen

Member
How are the users connected to the server?
If it takes a second from typing on keyboard to see it on the screen I would supect the network. Check for virus and if server has been hacked.
 

TomBascom

Curmudgeon
Let me first say, I am not a expert progress DBA. I do have some knowledge, but I am looking here for some direction.

Currently we run progress 9.1. Will be upgrading to 10.1 soon, but I'm first having to look at some performance issues.

We run a program called SX enterprise on top of progress. The reason I mention that is because of the table names.

We run both progress and SXe on an IBM AIX system. Most of the Progress tables are on a 2 gig fibre channel disk subsystem.

1st. users will say the system is moving slow. As an example they will show us their screen as they type in characters. Usually they'll hit the key and it'll take a second to appear on the screen.

That doesn't sound like a disk IO problem. That sounds like CPU or network congestion.

2nd. When we notice that happening, we'll fire up topas and notice that hdisk10 will be anywhere between 80-90% busy.

On our system, hdisk10 is mounted under db_area1 on the disk subsystem.

In that directory are numerous data files

ie.
nxt_30.d#
nxt_50.d#
nxt_107.d#

Well after some research it seems that the highest activity table in that directory is

nxt_107.d1

This isn't a table. It is a storage area. (It is possible that there is only single table assigned to that storage area but that isn't apparent from what you've shown and it would be contrary the usual SXE style.)

This is from the schema file

/db_area1/nxt_107.d1 f 2000000
/db_area1/nxt_107.d2 f 2000000
/db_area1/nxt_107.d3 f 2000000
/db_area1/nxt_107.d4

This is a snippet from a structure file.

From speaking with others about this problem, there are about 3 schools of thought on what size those files should be and on what disk they should be distributed on. None of which I full understand.

If anyone has any suggestions on how to better configure these files for better performance, I would appreciate it.

Thanks in advance and I can provide any additional information as needed.

Kevin

Generally speaking placing specific tables or storage areas or extents on particular disks is an exercise in frustration. Your best throughput is to use a striped filesystem. RAID10 on a SAN is typical. It is unclear if you are using some sort of SAN or if you have individual disks.

Having said that... slow keystroke response isn't usually related to slow disks.
 

kblackwel

New Member
Everyone,

Thanks for the comments.

Let me back up here a second. I might be addressing multiple problems, so I'll start with one.

Our Disk subsystem is a IBM FastT. There are 5 arrays each raid 1. The AIX system sees 5 disks.

Now I would think that if we say hdisk10 which is array1 (4 disks raid 1) on the disk subsystem jumps to 80-90-100% busy, we think that is a performance issue.

A ls of /hdisk10 shows

lost+found nxt_103.d1 nxt_105.d1 nxt_107.d1 nxt_107.d4 nxt_111.d1 nxt_30.d1 nxt_50.d1
nxt_101.d1 nxt_103.d2 nxt_105.d2 nxt_107.d2 nxt_109.d1 nxt_111.d2 nxt_30.d2 nxt_50.d2
nxt_101.d2 nxt_103.d3 nxt_105.d3 nxt_107.d3 nxt_109.d2 nxt_111.d3 nxt_30.d3

When we look at the most heavily used file in that directory, nxt_107.d1 comes up. nxt_30 show up too, but not as high of activity.

We feel that when the disk activity goes up to 80-90-100%, that that is creating some type of bottle neck for us.

Our guess is that oeeh's table is in that file and it is a heavily used screen within our company. So it having high I/O makes sense.

What we don't know is if the way we have the files spread across the disk is the optimal way to access that file.

Please excuse and missed terminology lingo. I'm still trying to get my head around the way progress implements a DB and all the pieces involved.
 

TomBascom

Curmudgeon
Well sure, 80-90% utilization is a sign of high disk IO. It's just that that would usually be reflected in screen to screen sorts of response time issues rather than at the individual keystroke level.

No, 5 distinct sets of disks is probably not a good arrangement.

The FastT disks are a pretty good technology but like anything they are only as good as you let them be. You would generally be better off with a single 10 disk RAID 10 (two 5 disk stripes that are mirrored) than with 5 RAID 1s. Same number of disks but the IO will be way more balanced and the hot spots will be far less busy.

A database is made up of multiple "storage areas" (sort of like an Oracle "table space") which in turn is made up of extents (files). "Nxt" is the database name. The _107 and _30 parts are indicating the storage area number. The d# part is the extent number.

OEEH may, or may not, be in that storage area. A dbanalys report would remove the guessing. Use:

proutil nxt -C dbanalys > nxt.dba

It can be run while the db is up. It will take a while to run. Maybe a long while if your db is large and your disks are busy & slow.
 
You probably want to start your investigation by running promon against the database, getting a screen shot of the following menu options - 5, 6, 7 and the R&D Status and Activity File I/O screens.

Then do a prostrct dbname statistics > dbname.statistics

If you post this info, you'll probably get some good advice on what the problem is.
 
Top